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Preface 


It was our privilege to serve as the program chairs for CAV 2019, the 31st International 
Conference on Computer-Aided Verification. CAV 2019 was held in New York, USA, 
during July 15-18, 2019. The tutorial day was on July 14, 2019, and the pre-conference 
workshops were held during July 13-14, 2019. All events took place in The New 
School in New York City. 

CAV is an annual conference dedicated to the advancement of the theory and 
practice of computer-aided formal analysis methods for hardware and software sys- 
tems. The primary focus of CAV is to extend the frontiers of verification techniques by 
expanding to new domains such as security, quantum computing, and machine 
learning. This put CAV at the cutting edge of formal methods research, and this year’s 
program is a reflection of this commitment. 

CAV 2019 received a very high number of submissions (258). We accepted 13 tool 
papers, two case studies, and 52 regular papers, which amounts to an acceptance rate of 
roughly 26%. The accepted papers cover a wide spectrum of topics, from theoretical 
results to applications of formal methods. These papers apply or extend formal methods 
to a wide range of domains such as concurrency, learning, and industrially deployed 
systems. The program featured invited talks by Dawn Song (UC Berkeley), Swarat 
Chaudhuri (Rice University), and Ken McMillan (Microsoft Research) as well as 
invited tutorials by Emina Torlak (University of Washington) and Ranjit Jhala (UC San 
Diego). Furthermore, we continued the tradition of Logic Lounge, a series of discus- 
sions on computer science topics targeting a general audience. 

In addition to the main conference, CAV 2019 hosted the following workshops: The 
Best of Model Checking (BeMC) in honor of Orna Grumberg, Design and Analysis of 
Robust Systems (DARS), Verification Mentoring Workshop (VMW), Numerical 
Software Verification (NSV), Verified Software: Theories, Tools, and Experiments 
(VSTTE), Democratizing Software Verification, Formal Methods for ML-Enabled 
Autonomous Systems (FoMLAS), and Synthesis (SYNT). 

Organizing a top conference like CAV requires a great deal of effort from the 
community. The Program Committee for CAV 2019 consisted of 79 members, a 
committee of this size ensures that each member has to review a reasonable number of 
papers in the allotted time. In all, the committee members wrote over 770 reviews while 
investing significant effort to maintain and ensure the high quality of the conference 
program. We are grateful to the CAV 2019 Program Committee for their outstanding 
efforts in evaluating the submissions and making sure that each paper got a fair chance. 

Like last year’s CAV, we made artifact evaluation mandatory for tool submissions 
and optional but encouraged for the rest of the accepted papers. The Artifact Evaluation 
Committee consisted of 27 reviewers who put in significant effort to evaluate each 
artifact. The goal of this process was to provide constructive feedback to tool devel- 
opers and help make the research published in CAV more reproducible. The Artifact 
Evaluation Committee was generally quite impressed by the quality of the artifacts, 
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and, in fact, all accepted tools passed the artifact evaluation. Among regular papers, 
65% of the authors submitted an artifact, and 76% of these artifacts passed the eval- 
uation. We are also very grateful to the Artifact Evaluation Committee for their hard 
work and dedication in evaluating the submitted artifacts. 

CAV 2019 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2019 a success. First, we would like to thank Yu Feng and Ruben Martins for chairing 
the Artifact Evaluation Committee and Zvonimir Rakamaric for maintaining the CAV 
website and social media presence. We also thank Oksana Tkachuk for chairing the 
workshop organization process, Peter O’Hearn for managing sponsorship, and Thomas 
Wies for arranging student fellowships. We also thank Loris D’Antoni, Rayna 
Dimitrova, Cezara Dragoi, and Anthony W. Lin for organizing the Verification 
Mentoring Workshop and working closely with us. Last but not least, we would like to 
thank Kostas Ferles, Navid Yaghmazadeh, and members of the CAV Steering 
Committee (Ken McMillan, Aarti Gupta, Orna Grumberg, and Daniel Kroening) for 
helping us with several important aspects of organizing CAV 2019. 

We hope that you will find the proceedings of CAV 2019 scientifically interesting 
and thought-provoking! 
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Abstract. Symbolic Finite Automata and Register Automata are two 
orthogonal extensions of finite automata motivated by real-world prob- 
lems where data may have unbounded domains. These automata address 
a demand for a model over large or infinite alphabets, respectively. Both 
automata models have interesting applications and have been success- 
ful in their own right. In this paper, we introduce Symbolic Register 
Automata, a new model that combines features from both symbolic and 
register automata, with a view on applications that were previously out 
of reach. We study their properties and provide algorithms for emptiness, 
inclusion and equivalence checking, together with experimental results. 


1 Introduction 


Finite automata are a ubiquitous formalism that is simple enough to model 
many real-life systems and phenomena. They enjoy a large variety of theoret- 
ical properties that in turn play a role in practical applications. For example, 
finite automata are closed under Boolean operations, and have decidable empti- 
ness and equivalence checking procedures. Unfortunately, finite automata have 
a fundamental limitation: they can only operate over finite (and typically small) 
alphabets. Two orthogonal families of automata models have been proposed to 
overcome this: symbolic automata and register automata. In this paper, we show 
that these two models can be combined yielding a new powerful model that can 
cover interesting applications previously out of reach for existing models. 

Symbolic finite automata (SFAs) allow transitions to carry predicates over 
rich first-order alphabet theories, such as linear arithmetic, and therefore extend 
classic automata to operate over infinite alphabets [12]. For example, an SFA can 
define the language of all lists of integers in which the first and last elements are 
positive integer numbers. Despite their increased expressiveness, SFAs enjoy the 
same closure and decidability properties of finite automata—e.g., closure under 
Boolean operations and decidable equivalence and emptiness. 
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book TAV Research Award, the ERC starting grant Profoundnet (679127) and a Lev- 
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Register automata (RA) support infinite alphabets by allowing input charac- 
ters to be stored in registers during the computation and to be compared against 
existing values that are already stored in the registers [17]. For example, an RA 
can define the language of all lists of integers in which all numbers appearing in 
even positions are the same. RAs do not have some of the properties of finite 
automata (e.g., they cannot be determinized), but they still enjoy many useful 
properties that have made them a popular model in static analysis, software 
verification, and program monitoring [15]. 

In this paper, we combine the best features of these two models—first order 
alphabet theories and registers—into a new model, symbolic register automata 
(SRA). SRAs are strictly more expressive than SFAs and RAs. For example, 
an SRA can define the language of all lists of integers in which the first and 
last elements are positive rational numbers and all numbers appearing in even 
positions are the same. This language is not recognizable by either an SFA nor 
by an RA. 

While other attempts at combining symbolic automata and registers have 
resulted in undecidable models with limited closure properties [11], we show 
that SRAs enjoy the same closure and decidability properties of (non-symbolic) 
register automata. We propose a new application enabled by SRAs and imple- 
ment our model in an open-source automata library. 

In summary, our contributions are: 


— Symbolic Register Automata (SRA): a new automaton model that can handle 
complex alphabet theories while allowing symbols at arbitrary positions in the 
input string to be compared using equality (Sect. 3). 

— A thorough study of the properties of SRAs. We show that SRAs are closed 
under intersection, union and (deterministic) complementation, and provide 
algorithms for emptiness and forward (bi)simulation (Sect. 4). 

— A study of the effectiveness of our SRA implementation on handling regular 
expressions with back-references (Sect.5). We compile a set of benchmarks 
from existing regular expressions with back-references (e.g., (\d) [a-z]*\1) 
and show that SRAs are an effective model for such expressions and existing 
models such as SFAs and RAs are not. Moreover, we show that SRAs are more 
efficient than the java.util.regex library for matching regular expressions 
with back-references. 


2 Motivating Example 


In this section, we illustrate the capabilities of symbolic register automata using 
a simple example. Consider the regular expression rp shown in Fig. la. This 
expression, given a sequence of product descriptions, checks whether the prod- 
ucts have the same code and lot number. The reader might not be familiar with 
some of the unusual syntax of this expression. In particular, rp uses two back- 
references \1 and \2. The semantics of this construct is that the string matched 
by the regular expression for \1 (resp. \2) should be exactly the string that 
matched the subregular expression r appearing between the first (resp. second) 
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C:(.{3}) L: C.) D:[7\s]+¢€ C:\1 L:\2 D:[7\s]+)+ 


(a) Regular expression rp (with back-reference). 


C:X4a L:4 D:bottle C:X4a L:4 D:jar C:X4a L:4 D:bottle C:X5a L:4 D:jar 
(b) Example text matched by rp. (c) Example text not matched by rp. 
^s 


1050500 ai 


true/>r, true/r,, true/r, true/4r, 


~O-0-O™0™ 0-0-0-0 


true/=r, 


~0-0™"50-020-0- 


(d) Snippets of a symbolic register automaton A, corresponding to rp. 


Fig. 1. Regular expression for matching products with same code and lot number—i.e., 
the characters of C and L are the same in all the products. 


two parenthesis, in this case (.{3}) (resp. (.)). Back-references allow regular 
expressions to check whether the encountered text is the same or is different 
from a string/character that appeared earlier in the input (see Figs. 1b and c for 
examples of positive and negative matches). 

Representing this complex regular expression using an automaton model 
requires addressing several challenges. The expression rp: 


1. operates over large input alphabets consisting of upwards of 218 characters; 

2. uses complex character classes (e.g., \s) to describe different sets of characters 
in the input; 

3. adopts back-references to detect repeated strings in the input. 


Existing automata models do not address one or more of these challenges. Finite 
automata require one transition for each character in the input alphabet and 
blow-up when representing large alphabets. Symbolic finite automata (SFA) 
allow transitions to carry predicates over rich structured first-order alphabet 
theories and can describe, for example, character classes [12]. However, SFAs 
cannot directly check whether a character or a string is repeated in the input. 
An SFA for describing the regular expression rp would have to store the charac- 
ters after C: directly in the states to later check whether they match the ones of 
the second product. Hence, the smallest SFA for this example would require bil- 
lions of states! Register automata (RA) and their variants can store characters in 
registers during the computation and compare characters against values already 
stored in the registers [17]. Hence, RAs can check whether the two products have 
the same code. However, RAs only operate over unstructured infinite alphabets 
and cannot check, for example, that a character belongs to a given class. 

The model we propose in this paper, symbolic register automata (SRA), com- 
bines the best features of SFAs and RAs—first-order alphabet theories and 
registers—and can address all the three aforementioned challenges. Figure 1d 
shows a snippet of a symbolic register automaton A, corresponding to rp. Each 
transition in A, is labeled with a predicate that describes what characters can 
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trigger the transition. For example, ~\s denotes that the transition can be trig- 
gered by any non-space character, L denotes that the transition can be triggered 
by the character L, and true denotes that the transition can be triggered by any 
character. Transitions of the form y/—r; denote that, if a character x satisfies 
the predicate vy, the character is then stored in the register r;. For example, the 
transition out of state 1 reads any character and stores it in register rı. Finally, 
transitions of the form y/= r; are triggered if a character x satisfies the pred- 
icate y and x is the same character as the one stored in r;. For example, the 
transition out of state 2 can only be triggered by the same character that was 
stored in rı when reading the transition out state 1—1.e., the first characters in 
the product codes should be the same. 

SRAs are a natural model for describing regular expressions like rp, where 
capture groups are of bounded length, and hence correspond to finitely-many 
registers. The SRA A, has fewer than 50 states (vs. more than 100 billion for 
SFAs) and can, for example, be used to check whether an input string matches 
the given regular expression (e.g., monitoring). More interestingly, in this paper 
we study the closure and decidability properties of SRAs and provide an imple- 
mentation for our model. For example, consider the following regular expression 
rpc that only checks whether the product codes are the same, but not the lot 
numbers: 


C:(.{3}) L:. D:[7\s]+¢€ C:\1 L:. D:[7\s]+)+ 


The set of strings accepted by rpc is a superset of the set of strings accepted by 
Tp. In this paper, we present simulation and bisimulation algorithms that can 
check this property. Our implementation can show that r, subsumes rpc in 25s 
and we could not find other tools that can prove the same property. 


3 Symbolic Register Automata 


In this section we introduce some preliminary notions, we define symbolic register 
automata and a variant that will be useful in proving decidability properties. 


Preliminaries. An effective Boolean algebra A is a tuple (D,¥,[-], 1, 
T,A,V,7), where: D is a set of domain elements; W is a set of predicates 
closed under the Boolean connectives and L,T € W. The denotation func- 
tion [_]: ¥ — 2? is such that [L] = Ø and [T] = D, for all y,w € Y, 
[eV yl = WU LL (ed = [ed nf, and Pel = D \ [ol For p € Y, 
we write isSat(y) whenever |p] 4 @ and say that ¢ is satisfiable. A is decidable 
if isSat is decidable. For each a € D, we assume predicates atom(a) such that 


[atom(a)] = {a}. 


Example 1. The theory of linear integer arithmetic forms an effective BA, where 
D = Zand ¥ contains formulas y(x) in the theory with one fixed integer variable. 
For example, div, := (x mod k) = 0 denotes the set of all integers divisible by k. 
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Notation. Given a set S, we write P(S) for its powerset. Given a function 
f: A — B, we write f[a + b] for the function such that fla — b|(a) = b 
and fla + b|(x) = f(x), for x # a. Analogously, we write f[S — b], with 
S C A, to map multiple values to the same b. The pre-image of f is the function 
f-1: P(B) — P(A) given by f-1(S) = {a | Ib € S: b = f(a)}; for readability, 
we will write f~'(x) when S = {a}. Given a relation R C A x B, we write aRb 
for (a,b) E R. 


Model Definition. Symbolic register automata have transitions of the form: 


/E,I1,U 
> 


where p and q are states, y is a predicate from a fixed effective Boolean algebra, 
and E,I,U are subsets of a fixed finite set of registers R. The intended inter- 
pretation of the above transition is: an input character a can be read in state 
q if (i) a € fọ], (ii) the content of all the registers in E is equal to a, and (iii) 
the content of all the registers in I is different from a. If the transition succeeds 
then a is stored into all the registers U and the automaton moves to q. 


Example 2. The transition labels in Fig. 1d have been conveniently simplified to 
ease intuition. These labels correspond to full SRA labels as follows: 


g/or = 9/0,0,{r}  p/=r = o/{r},00 9 = 7/9,0,0 . 


Given a set of registers R, the transitions of an SRA have labels over the following 
set: Lr = Wx {(£,1,U) € P(R) x P(R) x P(R) | ENT = Ø}. The condition 
EOI = Ú guarantees that register constraints are always satisfiable. 


Definition 1 (Symbolic Register Automaton). A symbolic register 
automaton (SRA) is a 6-tuple (R, Q, qo, vo, F, A), where R is a finite set of reg- 
isters, Q is a finite set of states, qo € Q is the initial state, vo: R — DU {ft} is 
the initial register assignment (if vo(r) = t, the register r is considered empty), 
F C Q is a finite set of final states, and A C Q x Lpr x Q is the transition 
relation. Transitions (p,(y,0),q) € A will be written as p seh, q. 

An SRA can be seen as a finite description of a (possibly infinite) labeled tran- 
sition system (LTS), where states have been assigned concrete register values, 
and transitions read a single symbol from the potentially infinite alphabet. This 
so-called configuration LTS will be used in defining the semantics of SRAs. 


Definition 2 (Configuration LTS). Given an SRA §, the configuration LTS 
CLTS(8) is defined as follows. A configuration is a pair (p,v) where p E€ Q is 
a state in $ and a v: R — DU {f} is register assignment; (qo, vo) is called the 
initial configuration; every (q,v) such that q E€ F is a final configuration. The 
set of transitions between configurations is defined as follows: 

aleei qEA ECvwi(a) Inv (a)=6 
(p.0) & (q, v[U = al) € CLTS(8) 
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Intuitively, the rule says that a SRA transition from p can be instantiated to 
one from (p,v) that reads a when the registers containing the value a, namely 
v_'(a), satisfy the constraint described by E,I (a is contained in registers Æ 
but not in J). If the constraint is satisfied, all registers in U are assigned a. 

A run of the SRA 8 is a sequence of transitions in CLTS(S) starting from the 
initial configuration. A configuration is reachable whenever there is a run ending 
up in that configuration. The language of an SRA §& is defined as 


L(8) := {a1 ... an E D” | A(qo, vo) 4.0043 (dn, Un) E CLTS(S), qn E€ F} 


An SRA 8 is deterministic if its configuration LTS is; namely, for every word 
w € D* there is at most one run in CLTS(S) spelling w. Determinism is important 
for some application contexts, e.g., for runtime monitoring. Since SRAs subsume 
RAs, nondeterministic SRAs are strictly more expressive than deterministic ones, 
and language equivalence is undecidable for nondeterministic SRAs [27]. 

We now introduce the notions of simulation and bisimulation for SRAs, which 
capture whether one SRA behaves “at least as” or “exactly as” another one. 


Definition 3 ((Bi)simulation for SRAs). A simulation R on SRAs 8, and 
8 is a binary relation R on configurations such that (p,,v1)R(p2, v2) implies: 


- of pı E€ Fy then pə © Fo; 
- for each transition (pı, v1) = (q1, w1) in CLTS(S8,), there exists a transition 
(p2,v2) = (q2, w2) in CLTS(S2) such that (q1, w1)R(q2, w2). 


A simulation R is a bisimulation if R71 is a also a simulation. We write $1 < $2 
(resp. 81 ~ S2) whenever there is a simulation (resp. bisimulation) R such that 
(qo1, vor) R(qo2, Voz), where (qoi, voi) is the initial configuration of Si, for i = 1,2. 


We say that an SRA is complete whenever for every configuration (p,v) and 
a € D there is a transition (p,v) = (q,w) in CLTS(S). The following results 
connect similarity and language inclusion. 


Proposition 1. If < 82 then L (S1) C (82). If Sı and So are deterministic 
and complete, then the other direction also holds. 


It is worth noting that given a deterministic SRA we can define its completion 
by adding transitions so that every value a € D can be read from any state. 


Remark 1. RAs and SFAs can be encoded as SRAs on the same state-space: 


— An RA is encoded as an SRA with all transition guards T; 


— an SFA can be encoded as an SRA with R = 9, with each SFA transition 


p & q encoded as p LILO, q. Note that the absence of registers implies that 


the CLTS always has finitely many configurations. 


SRAs are strictly more expressive than both RAs and SFAs. For instance, the 
language {non ... Nk | no = Nng, even(ni), ni € Z, i = 1,..., k} of finite sequences 
of even integers where the first and last one coincide, can be recognized by an 
SRA, but not by an RA or by an SFA. 
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Boolean Closure Properties. SRAs are closed under intersection and union. 
Intersection is given by a standard product construction whereas union is 
obtained by adding a new initial state that mimics the initial states of both 
automata. 


Proposition 2 (Closure under intersection and union). Given SRAs 8 
and 89, there are SRAs 8,182 and 8,U82 such that £(81N82) = L(81)NL (S2) 
and Z (Sı U S2) = Z (S1) U L (S2). 


SRAs in general are not closed under complementation, because RAs are not. 
However, we still have closure under complementation for a subclass of SRAs. 


Proposition 3. Let S be a complete and deterministic SRA, and let 8 be the 
SRA defined as 8, except that its final states are Q\ F. Then £(8) = D*\ £(8). 


4 Decidability Properties 


In this section we will provide algorithms for checking determinism and emptiness 
for an SRA, and (bi)similarity of two SRAs. Our algorithms leverage symbolic 
techniques that use the finite syntax of SRAs to indirectly operate over the 
underlying configuration LTS, which can be infinite. 


Single-Valued Variant. To study decidability, it is convenient to restrict reg- 
ister assignments to injective ones on non-empty registers, that is functions 
v: R += DU {£} such that v(r) = v(s) and v(r) 4 # implies r = s. This is 
also the approach taken for RAs in the seminal papers [17,27]. Both for RAs 
and SRAs, this restriction does not affect expressivity. We say that an SRA is 
single-valued if its initial assignment vo is injective on non-empty registers. For 
single-valued SRAs, we only allow two kinds of transitions: 


Read transition: p er, q triggers when a € [y] and a is already stored in r. 
Fresh transition: p ar q triggers when the input a € [y] and a is fresh, i.e., 
is not stored in any register. After the transition, a is stored into r. 


SRAs and their single-valued variants have the same expressive power. Trans- 
lating single-valued SRAs to ordinary ones is straightforward: 
p/r7 e/{r}00 p/r? /0,R,{r} 
p— q = p — 4 p— q => p — ~~ 4 
The opposite translation requires a state-space blow up, because we need to 
encode register equalities in the states. 


Theorem 1. Given an SRA S with n states and r registers, there is a single- 
valued SRA 8’ with O(nr") states and r+1 registers such that S ~ 8’. Moreover, 
the translation preserves determinism. 
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Normalization. While our techniques are inspired by analogous ones for non- 
symbolic RAs, SRAs present an additional challenge: they can have arbitrary 
predicates on transitions. Hence, the values that each transition can read, and 
thus which configurations it can reach, depend on the history of past transitions 
and their predicates. This problem emerges when checking reachability and sim- 
ilarity, because a transition may be disabled by particular register values, and so 
lead to unsound conclusions, a problem that does not exist in register automata. 


Example 3. Consider the SRA below, defined over the BA of integers. 


vo(r) =0 


All predicates on transitions are satisfiable, yet “(S) = Ø. To go from 0 to 1, § 
must read a value n such that div3(n) and n Æ 0 and then n is stored into r. The 
transition from 1 to 2 can only happen if the content of r also satisfies divs (n) and 
n € [0,10]. However, there is no n satisfying div3(n) An 4 0Adivs(n)An € [0,10], 
hence the transition from 1 to 2 never happens. 


To handle the complexity caused by predicates, we introduce a way of normaliz- 
ing an SRA to an equivalent one that stores additional information about input 
predicates. We first introduce some notation and terminology. 

A register abstraction 0 for 8, used to “keep track” of the domain of regis- 
ters, is a family of predicates indexed by the registers R of 8. Given a register 
assignment v, we write v = 0 whenever v(r) € [6,] for u(r) A #, and 6, = L 
otherwise. Hereafter we shall only consider “meaningful” register abstractions, 
for which there is at least one assignment v such that v = 8. 

With the contextual information about register domains given by 6, we say 


that a transition p ae, q € Ais enabled by 6 whenever it has at least an instance 
(p, v) “+ (q, w) in CLTS(S), for all v K 0. Enabled transitions are important when 
reasoning about reachability and similarity. 

Checking whether a transition has at least one realizable instance in the CLTS 
is difficult in practice, especially when £ = r°, because it amounts to checking 
whether [y] \ img(v) 4 9, for all injective v } 8. 

To make the check for enabledness practical we will use minterms. For a set 
of predicates ®, a minterm is a minimal satisfiable Boolean combination of all 
predicates that occur in ®. Minterms are the analogue of atoms in a complete 
atomic Boolean algebra. E.g. the set of predicates ® = {x > 2,a < 5} over the 
theory of linear integer arithmetic has minterms mint(@) = {a > 2Au <5, ax > 
2Aun <5, © >2An7u < 5}. Given Y € mint(®) and y E€ P, we will write pC Y% 
whenever y appears non-negated in w, for instance (x > 2) C (a >2An7a <5). 
A crucial property of minterms is that they do not overlap, i.e., isSat(~, A Y2) 
if and only if Yı = we, for Yı and Y2 minterms. 


Lemma 1 (Enabledness). Let 0 be a register abstraction such that 0, is a 


minterm, for allr € R. If p is a minterm, then p we q ts enabled by 0 iff: 
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(1) if €= r=, then p = 0r; (2) if£=r°, then |[y]| > (0, ¢), 
where &(0,y) = {r € R | 0, = p}| is the # of registers with values from [y]. 


Intuitively, (1) says that if the transition reads a symbol stored in r satisfying y, 
the symbol must also satisfy 6,, the range of r. Because y and 0, are minterms, 
this only happens when y = 0ẹ,. (2) says that the enabling condition [y] \ 
img(v) 4 Q, for all injective v | 6, holds if and only if there are fewer registers 
storing values from y than the cardinality of y. That implies we can always 
find a fresh element in [y] to enable the transition. Registers holding values 
from y are exactly those r € R such that #- = y. Both conditions can be 
effectively checked: the first one is a simple predicate-equivalence check, while the 
second one amounts to checking whether y holds for at least a certain number 
k of distinct elements. This can be achieved by checking satisfiability of p A 
matom(a,) A+++ A matom(ag_1), for a1,...,@%—1 distinct elements of [y]. 


Remark 2. Using single-valued SRAs to check enabledness might seem like a 
restriction. However, if one would start from a generic SRA, the process to 
check enabledness would contain an extra step: for each state p, we would have 
to keep track of all possible equations among registers. In fact, register equalities 
determine whether (i) register constraints of an outgoing transition are satisfi- 
able; (ii) how many elements of the guard we need for the transition to happen, 
analogously to condition 2 of Lemma 1. Generating such equations is the key 
idea behind Theorem 1, and corresponds precisely to turning the SRA into a 
single-valued one. 


Given any SRA, we can use the notion of register abstraction to build an equiva- 
lent normalized SRA, where (i) states keep track of how the domains of registers 
change along transitions, (i) transitions are obtained by breaking the one of the 
original SRA into minterms and discarding the ones that are disabled according 
to Lemma 1. In the following we write mint(S) for the minterms for the set of 


£ 
predicates {y | p pl q E€ A} U {atom(vo(r)) | vo(r) € D,r € R}. Observe that 
an atomic predicate always has an equivalent minterm, hence we will use atomic 
predicates to define the initial register abstraction. 


Definition 4 (Normalized SRA). Given an SRA $, its normalization N(8) 
is the SRA (R,N(Q), N(qo), vo, N(F), N(A)) where: 


N(Q) = {0 | 6 is a register abstraction over mint(S)U{L} } x Q; we will write 


0 œ> q for (0,4) E N(Q). 
- N(qo) = 00 & qo, where (Oo)r = atom(vo(r)) if vo(r) € D, and (4), = L if 


vo(r) = f; 
~N(F)={@>pENQ)|pEF} _ 
- N(A) ={0 > p= 9r q| p £ qE A, pE 0}U 


lo p 25 ofr = yo q| p £ qe Aye y, liell > £04) 
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The automaton N(S) enjoys the desired property: each transition from 6 > p is 
enabled by @, by construction. N(S) is always finite. In fact, suppose $ has n 
states, m transitions and r registers. Then N(8) has at most m predicates, and 
|mint(S)| is O(2™). Since the possible register abstractions are O(r2™), N(S) has 
O(nr2™) states and O(mr?23™) transitions. 


Example 4. We now show the normalized version of Example 3. The first step is 
computing the set mint(8) of minterms for 8, i.e., the satisfiable Boolean combi- 
nations of {atom(0), divs, [0,10] A divs, < OV > 10}. For simplicity, we represent 
minterms as bitvectors where a 0 component means that the corresponding pred- 
icate is negated, e.g., [1, 1, 1,0] stands for the minterm atom(0) A ([0, 10] Adiv3) A 
divs A =(< OV > 10). Minterms and the resulting SRA N(8) are shown below. 


[0, 1, 0, 0], 


[0, 0, 0, 0}, mef ro t} ian 
[0, 0, 0, 1], je MAAA 
: 0, 1, 0], mjr wee 
mint(S) = i0 A — o 
4,0, 1], 2, I jx 
4,1, 0] (3) [r + [0, 1, 0, 1]] 


On each transition we show how it is broken down to minterms, and for each 
state we show the register abstraction (note that state 1 becomes two states in 
N(S)). The transition from 1 to 2 is not part of N(S) — this is why it is dotted. In 
fact, in every register abstraction [r + m] reachable at state 1, the component 
for the transition guard [0,10] A^ divs in the minterm m (3rd component) is 0, i.e., 
([0, 10] A divs) Z m. Intuitively, this means that r will never be assigned a value 
that satisfies [0,10] Adivs. As a consequence, the construction of Definition 4 will 
not add a transition from 1 to 2. 


bo = [r => atom(0)] [r => m] 


Finally, we show that the normalized SRA behaves exactly as the original one. 


Proposition 4. (p,v) ~ (0 œ p,v), for allp E€ Q and v = 8. Hence, S ~ N(8). 


Emptiness and Determinism. The transitions of N(S) are always enabled 


by construction, therefore every path in N(S) always corresponds to a run in 
CLTS(N(8)). 


Lemma 2. The state 6>p is reachable in N(S) if and only if there is a reachable 
configuration (0 > p,v) in CLTS(N(S)) such that v = 0. Moreover, if (0 > p,v) 
is reachable, then all configurations (0 > p,w) such that w = 0 are reachable. 


Therefore, using Proposition 4, we can reduce the reachability and emptiness 
problems of 8 to that of N(8). 


Theorem 2 (Emptiness). There is an algorithm to decide reachability of any 
configuration of S, hence whether £(8) = 0. 


Proof. Let (p,v) be a configuration of $. To decide whether it is reachable in 
CLTS(S), we can perform a visit of N(S) from its initial state, stopping when a 
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state 0 © p such that v — 0 is reached. If we are just looking for a final state, we 
can stop at any state such that p € F. In fact, by Proposition 4, there is a run 
in CLTS(S) ending in (p, v) if and only if there is a run in CLTS(N(8)) ending in 
(8 œ p,v) such that v = 6. By Lemma 2, the latter holds if and only if there is a 
path in N(8) ending in 0 p. This algorithm has the complexity of a standard 
visit of N(S), namely O(nr2™ + mr?23”),. 


Now that we characterized which transitions are reachable, we define what it 
means for a normalized SRA to be deterministic and we show that determinism 
is preserved by the translation from SRA. 


Proposition 5 (Determinism). N(S) is deterministic if and only if for all 


reachable transitions p purk qı; p piit, q2 E N(A) the following holds: pı # p2 


whenever either (1) L = 2 and qı # qe, or; (2) G =r°, l2 = 8°, and r £ s; 


One can check determinism of an SRA by looking at its normalized version. 


Proposition 6. S is deterministic if and only if N(S) is deterministic. 


Similarity and Bisimilarity. We now introduce a symbolic technique to 
decide similarity and bisimilarity of SRAs. The basic idea is similar to sym- 
bolic (bi)simulation [20,27] for RAs. Recall that RAs are SRAs whose transition 
guards are all T. Given two RAs 8, and 82 a symbolic simulation between them 
is defined over their state spaces Qı and Q2, not on their configurations. For this 
to work, one needs to add an extra piece of information about how registers of 
the two states are related. More precisely, a symbolic simulation is a relation on 
triples (p1, p2, 0), where pı € Q1, p2 € Q2 and o C Rı x Rə is a partial injective 
function. This function encodes constraints between registers: (r,s) € o is an 
equality constraint between r € Rı and s € Rọ, and (r,s) ¢ o is an inequality 
constraint. Intuitively, (p1, p2, o) says that all configurations (p1, v1) and (po, v2) 
such that vı and v2 satisfy o — e.g., vi(r) = v2(s) whenever (r,s) € ø — are in 
the simulation relation (p1, v1) < (p2, v2). In the following we will use vı D< v2 to 
denote the function encoding constraints among vı and vg, explicitly: o(r) = s 
if and only if v1 (7) = ve(s) and vı (r) # £. 


Definition 5 (Symbolic (bi)similarity [27]). A symbolic simulation is a rela- 
tion R C Qı X Qı x P(R, x R2) such that if (p1, p2,0) E R, then pı € Fı implies 
po € Fo, and if pı ka qı € Ay! then: 
1. if l= r5: 
(a) ifr € dom(c), then there is po aur q2 E€ Ag such that (q1, q2,0) € R. 
(b) ifr ¢ dom(c) then there is po $; go E Ag s.t. (q1, q2, o|r > s]) ER. 


1 We will keep the T guard implicit for succinctness. 
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2 if kl=r": 
(a) for all s € R \ img(o), there is po =. qo € A such that (q, G2, o|r > 
s]) E€ R, and; 


(b) there is po = q2 € Ag such that (qı, q2, olr = s|) E R. 


Here o|r — s] stands for o \ (o~'(s),s) U (r,s), which ensures that o stays 
injective when updated. 

Given a symbolic simulation R, its inverse is defined as R7! = {t7} | t € R}, 
where (p1,p2,0) + = (p2,p1, 07+). A symbolic bisimulation R is a relation such 
that both R and RT! are symbolic simulations. 


Case 1 deals with cases when pı can perform a transition that reads the register 
r. If r € dom(c), meaning that r and o(r) € Rə contain the same value, then po 
must be able to read o(r) as well. If r ¢ dom(c), then the content of r is fresh 
w.r.t. p2, SO pọ must be able to read any fresh value—in particular the content 
of r. Case 2 deals with the cases when pı reads a fresh value. It ensures that pə 
is able to read all possible values that are fresh for pı, be them already in some 
register s —i.e., s € Ry \img(c), case 2(a) — or fresh for po as well — case 2(b). In 
all these cases, ø must be updated to reflect the new equalities among registers. 

Keeping track of equalities among registers is enough for RAs, because the 
actual content of registers does not determine the capability of a transition to 
fire (RA transitions have implicit T guards). As seen in Example 3, this is no 
longer the case for SRAs: a transition may or may not happen depending on the 
register assignment being compatible with the transition guard. 

As in the case of reachability, normalized SRAs provide the solution to this 
problem. We will reduce the problem of checking (bi)similarity of 8; and 82 to 
that of checking symbolic (bi)similarity on N(S;) and N(82), with minor modifi- 
cations to the definition. To do this, we need to assume that minterms for both 
N(8,) and N(S2) are computed over the union of predicates of 8; and 8». 


Definition 6 (N-simulation). A N-simulation on Sı and 82 is a relation R C 
N(Q1) x N(Q2) x P(R, x Re), defined as in Definition 5, with the following 


modifications: 


£ 
(i) we require that 01> pı a 6,>q1 E N(41) must be matched by transitions 


b2 > po we, 05 > q2 E N(Ag) such that pz = 91. 


(ii) we modify case 2 as follows (changes are underlined): 
2(a)’ for all s € Rg \ img(o) such that pı = (62)5, there is 02 > po oa 
65 > q2 E€ N(42) such that (01 > qi, 05 > q2,0[r > s]) € R, and; 


2(b)’ if (01, p1) + & (62, 91) < |[yi]], then there is 02 > po pue, 0, > q € 


N(42) such that (01 > q1,05 > qo, o|r => s|) ER. 


A N-bisimulation R is a relation such that both R and Ro! are N-simulations. 
We write Sı < S2 (resp. Sı a S2) if there is a N-simulation (resp. bisimulation) 
R such that (N(qo1), N(qo2), vo1 P< vo2) € R. 
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The intuition behind this definition is as follows. Recall that, in a normalized 
SRA, transitions are defined over minterms, which cannot be further broken 
down, and are mutually disjoint. Therefore two transitions can read the same 
values if and only if they have the same minterm guard. Thus condition (i) makes 
sure that matching transitions can read exactly the same set of values. Analo- 
gously, condition (ii) restricts how a fresh transition of N($1) must be matched 
by one of N(82): 2(a)’ only considers transitions of N(S2) reading registers s € R2 
such that pı = (02), because, by definition of normalized SRA, 62 > pə has no 
such transition if this condition is not met. Condition 2(b)’ amounts to requiring 
a fresh transition of N(82) that is enabled by both 6; and 02 (see Lemma 1), i.e., 
that can read a symbol that is fresh w.r.t. both N(8,) and N(S2). 
N-simulation is sound and complete for standard simulation. 


Theorem 3. Sı < 82 if and only if Sı 3 S2. 


As a consequence, we can decide similarity of SRAs via their normalized versions. 
N-simulation is a relation over a finite set, namely N(Q1) x N(Q2) x P(Rı x Rə), 
therefore N-similarity can always be decided in finite time. We can leverage 
this result to provide algorithms for checking language inclusion /equivalence for 
deterministic SRAs (recall that they are undecidable for non-deterministic ones). 


Theorem 4. Given two deterministic SRAs Sı and 8g, there are algorithms to 


decide L (S1) C L (S2) and Z (S1) = Z (S2). 


Proof. By Proposition 1 and Theorem 3, we can decide Z (81) C (82) by 


checking 8, Z S2. This can be done algorithmically by iteratively building a 
relation R on triples that is an N-simulation on N(8,) and N(82). The algorithm 
initializes R with (N(qo1),N(qgoz), vo1 D< Voz), as this is required to be in R 
by Definition 6. Each iteration considers a candidate triple t and checks the 
conditions for N-simulation. If satisfied, it adds t to R and computes the next 
set of candidate triples, i.e., those which are required to belong to the simulation 
relation, and adds them to the list of triples still to be processed. If not, the 
algorithm returns Z ($1) Z -Z (S2). The algorithm terminates returning -Z (81) C 
Z (S2) when no triples are left to process. Determinism of 8; and 82, and hence 
of N(S;) and N(82) (by Proposition 6), ensures that computing candidate triples 
is deterministic. To decide #(81) = (82), at each iteration we need to check 
that both t and t~! satisfy the conditions for N-simulation. 

If Sı and 82 have, respectively, nı, na states, m1, M2 transitions, and r1,r2 
registers, the normalized versions have O(n1712™!) and O(ngr22™?) states. Each 
triple, taken from the finite set N(Q1) x N(Q2) x P(.R x R2), is processed exactly 
once, so the algorithm iterates O(nyngryrg2™ 7242) times. 


5 Evaluation 


We have implemented SRAs in the open-source Java library SVPALib [26]. In 
our implementation, constructions are computed lazily when possible (e.g., the 
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normalized SRA for emptiness and (bi)similarity checks). All experiments were 
performed on a machine with 3.5 GHz Intel Core i7 CPU with 16GB of RAM 
(JVM 8GB), with a timeout value of 300s. The goal of our evaluation is to 
answer the following research questions: 


Q1: Are SRAs more succinct than existing models when processing strings over 
large but finite alphabets? (Sect. 5.1) 

Q2: What is the performance of membership for deterministic SRAs and how 
does it compare to the matching algorithm in java.util.regex? (Sect. 5.2) 

Q3: Are SRA decision procedures practical? (Sect. 5.3) 


Benchmarks. We focus on regular expressions with back-references, therefore 
all our benchmarks operate over the Boolean algebra of Unicode characters with 
interval—i.e., the set of characters is the set of all 216 UTF-16 characters and 
the predicates are union of intervals (e.g., [a-zA-Z]).? Our benchmark set con- 
tains 19 SRAs that represent variants of regular expressions with back-references 
obtained from the regular-expression crowd-sourcing website RegExLib [23]. The 
expressions check whether inputs have, for example, matching first /last name ini- 
tials or both (Name-F, Name-L and Name), correct Product Codes/Lot number 
of total length n (Pr-Cn, Pr-CLn), matching XML tags (XML), and IP addresses 
that match for n positions (IPn). We also create variants of the product bench- 
mark presented in Sect. 2 where we vary the numbers of characters in the code 
and lot number. All the SRAs are deterministic. 


5.1 Succinctness of SRAs vs SFAs 


In this experiment, we relate the size of SRAs over finite alphabets to the size 
of the smallest equivalent SFAs. For each SRA, we construct the equivalent 
SFA by equipping the state space with the values stored in the registers at each 
step (this construction effectively builds the configuration LTS). Figure 2a shows 
the results. As expected, SFAs tend to blow up in size when the SRA contains 
multiple registers or complex register values. In cases where the register values 
range over small sets (e.g., [0-9]) it is often feasible to build an SFA equivalent 
to the SRA, but the construction always yields very large automata. In cases 
where the registers can assume many values (e.g., 216) SFAs become prohibitively 
large and do not fit in memory. To answer Q1, even for finite alphabets, it is 
not feasible to compile SRAs to SFAs. Hence, SRAs are a succinct model. 


5.2 Performance of Membership Checking 


In this experiment, we measure the performance of SRA membership, and we 
compare it with the performance of the java.util.regex matching algorithm. 


? Our experiments are over finite alphabets, but the Boolean algebra can be infinite 
by taking the alphabet to be positive integers and allowing intervals to contain oo as 
upper bound. This modification does not affect the running time of our procedures, 
therefore we do not report it. 
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SRA SFA 
states tr reg |reg|| states tr SRA 8; SRA 82| A AxaYW CH 
IP2 44 46 3 10 4,013 4,312 Pr-C2 Pr-CL2| 0.125s 0.905s 3.426s 
IP3 44 46 4 10] 39,113 42,112 Pr-C3 Pr-CL3| 1.294s 5.558s 24.688s 
Ip4 44 46 5 10 | 372,113 402,112 Pr-C4 Pr-CL4]13.577s 55.595s — 
IP6 44 46 7 10 = = Pr-C6 Pr-CL6 = = = 
IP9 44 46 10 10 = — Pr-CL2 Pr-C2 | 1.067s 0.952s 0.889s 
Name-F 710 2 26 201 300 Pr-CL3 Pr-C3 |10.998s 11.104s 11.811s 
Name-L 710 2 26 129 180 Pr-CL4 Pr-C4 = = = 
Name 710 3 26| 3,201 4,500 Pr-CL6 Pr-C6 — — = 
XML 1216 4 52 — = IP-2 IP-3 | 0.125s 0.4085 = 1.845s 
Pr-C2 26 28 3 2*6 = = IP-3 IP-4 | 1.2885 2.953s 21.627s 
Pr-C3 28 30 4 216 = _ IP-4 IP-6 _|18.440s 42.727s — 
Pr-C4 3032. 5. 27° — — IP-6 IP-9 — = _ 
16 par 
se ra A E 216 — E (b) Performance of decision procedures. 
T- = = ; 
Pr-CL2 26 28 3 216 = = In the table Y; = £(8;), for i = 1,2. 
Pr-CL3 28 30 4 216 = = 
Pr-CL4 30 32 5 2*6 — — 
Pr-CL6 34 36 7 2° — — 
Pr-CL9 40 42 10 21° — — 
(a) Size of SRAs vs SFAs. (—) denotes the 
SFA didn’t fit in memory. |reg| denotes how 


many different characters a register stored. al 

10-3 | 

10! 10? 10 104 10% 10° 107 10° 10° 
input length 


(c) SRA membership and Java regex 
matching performance. Missing data 
points for Java are stack overflows. 


Fig. 2. Experimental results. 


For each benchmark, we generate inputs of length varying between approxi- 
mately 100 and 108 characters and measure the time taken to check member- 
ship. Figure 2c shows the results. The performance of SRA (resp. Java) is not 
particularly affected by the size of the expression. Hence, the lines for different 
expressions mostly overlap. As expected, for SRAs the time taken to check mem- 
bership grows linearly in the size of the input (axes are log scale). Remarkably, 
even though our implementation does not employ particular input processing 
optimizations, it can still check membership for strings with tens of millions of 
characters in less than 10s. We have found that our implementation is more 
efficient than the Java regex library, matching the same input an average of 
50 times faster than java.util.regex.Matcher. java.util.regex.Matcher 
seems to make use of a recursive algorithm to match back-references, which 
means it does not scale well. Even when given the maximum stack size, the 
JVM will return a Stack Overflow for inputs as small as 20,000 characters. Our 
implementation can match such strings in less than 2s. To answer Q2, deter- 
ministic SRAs can be efficiently executed on large inputs and perform 
better than the java.util.regex matching algorithm. 
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5.3 Performance of Decision Procedures 


In this experiment, we measure the performance of SRAs simulation and bisim- 
ulation algorithms. Since all our SRAs are deterministic, these two checks cor- 
respond to language equivalence and inclusion. We select pairs of benchmarks 
for which the above tests are meaningful (e.g., variants of the problem discussed 
at the end of Sect. 2). The results are shown in Fig. 2b. As expected, due to the 
translation to single-valued SRAs, our decision procedures do not scale well in 
the number of registers. This is already the case for classic register automata 
and it is not a surprising result. However, our technique can still check equiva- 
lence and inclusion for regular expressions that no existing tool can handle. To 
answer Q3, bisimulation and simulation algorithms for SRAs only scale 
to small numbers of registers. 


6 Conclusions 


In this paper we have presented Symbolic Register Automata, a novel class of 
automata that can handle complex alphabet theories while allowing symbol com- 
parisons for equality. SRAs encompass — and are strictly more powerful — than 
both Register and Symbolic Automata. We have shown that they enjoy the same 
closure and decidability properties of the former, despite the presence of arbi- 
trary guards on transitions, which are not allowed by RAs. Via a comprehensive 
set of experiments, we have concluded that SRAs are vastly more succinct than 
SFAs and membership is efficient on large inputs. Decision procedures do not 
scale well in the number of registers, which is already the case for basic RAs. 


Related Work. RAs were first introduced in [17]. There is an extensive lit- 
erature on register automata, their formal languages and decidability proper- 
ties [7, 18,21, 22,25], including variants with global freshness [20,27] and totally 
ordered data [4,14]. SRAs are based on the original model of [17], but are much 
more expressive, due to the presence of guards from an arbitrary decidable 
theory. 

In recent work, variants over richer theories have appeared. In [9] RA over 
rationals were introduced. They allow for a restricted form of linear arithmetic 
among registers (RAs with arbitrary linear arithmetic subsume two-counter 
automata, hence are undecidable). SRAs do not allow for operations on reg- 
isters, but encompass a wider range of theories without any loss in decidability. 
Moreover, [9] does not study Boolean closure properties. In [8,16], RAs allow- 
ing guards over a range of theories — including (in)equality, total orders and 
increments/sums — are studied. Their focus is different than ours as they are 
interested primarily in active learning techniques, and several restrictions are 
placed on models for the purpose of the learning process. We can also relate 
SRAs with Quantified Event Automata [2], which allow for guards and assign- 
ments to registers on transitions. However, in QEA guards can be arbitrary, 
which could lead to several problems, e.g. undecidable equivalence. 
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Symbolic automata were first introduced in [28] and many variants of them 
have been proposed [12]. The one that is closer to SRAs is Symbolic Extended 
Finite Automata (SEFA) [11]. SEFAs are SFAs in which transitions can read 
more than one character at a time. A transition of arity k reads k symbols which 
are consumed if they satisfy the predicate y(a1,...,2,). SEFAs allow arbitrary 
k-ary predicates over the input theory, which results in most problems being 
undecidable (e.g., equivalence and intersection emptiness) and in the model not 
being closed under Boolean operations. Even when deterministic, SEFAs are 
not closed under union and intersection. In terms of expressiveness, SRAs and 
SEFAs are incomparable. SRAs can only use equality, but can compare symbols 
at arbitrary points in the input while SEFAs can only compare symbols within 
a constant window, but using arbitrary predicates. 

Several works study matching techniques for extended regular expres- 
sions [3,5,18,24]. These works introduce automata models with ad-hoc features 
for extended regular constructs — including back-references — but focus on effi- 
cient matching, without studying closure and decidability properties. It is also 
worth noting that SRAs are not limited to alphanumeric or finite alphabets. 
On the negative side, SRAs cannot express capturing groups of an unbounded 
length, due to the finitely many registers. This limitation is essential for 
decidability. 


Future Work. In [21] a polynomial algorithm for checking language equivalence 
of deterministic RAs is presented. This crucially relies on closure properties of 
symbolic bisimilarity, some of which are lost for SRAs. We plan to investigate 
whether this algorithm can be adapted to our setting. Extending SRAs with 
more complex comparison operators other than equality (e.g., a total order <) 
is an interesting research question, but most extensions of the model quickly 
lead to undecidability. We also plan to study active automata learning for SRAs, 
building on techniques for SFAs [1], RAs [6,8,16] and nominal automata [19]. 
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Abstract. We present abstraction-refinement algorithms for model 
checking safety properties of timed automata. The abstraction domain 
we consider abstracts away zones by restricting the set of clock con- 
straints that can be used to define them, while the refinement procedure 
computes the set of constraints that must be taken into consideration 
in the abstraction so as to exclude a given spurious counterexample. 
We implement this idea in two ways: an enumerative algorithm where 
a lazy abstraction approach is adopted, meaning that possibly different 
abstract domains are assigned to each exploration node; and a symbolic 
algorithm where the abstract transition system is encoded with Boolean 
formulas. 


1 Introduction 


Model checking [4,10,12,26] is an automated technique for verifying that the 
set of behaviors of a computer system satisfies a given property. Model-checking 
algorithms explore finite-state automata (representing the system under study) 
in order to decide if the property holds; if not, the algorithm returns an explana- 
tion. These algorithms have been extended to verify real-time systems modelled 
as timed automata [2,3], an extension of finite automata with clock variables to 
measure and constrain the amount of time elapsed between occurrences of transi- 
tions. The state-space exploration can be done by representing clock constraints 
efficiently using convex polyhedra called zones [8,9]. Algorithms based on this 
data structure have been implemented in several tools such as Uppaal [7], and 
have been applied in various industrial cases. 

The well-known issue in the applications of model checking is the state-space 
explosion problem: the size of the state space grows exponentially in the size 
of the description of the system. There are several sources for this explosion: 
the system might be made of the composition of several subsystems (such as 
a distributed system), it might contain several discrete variables (such as in a 
piece of software), or it might contain a number of real-valued clocks as in our 
case. 
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grant EQuallS (StG-308087). 
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Numerous attempts have been made to circumvent this problem. Abstrac- 
tion is a generic approach that consists in simplifying the model under study, 
so as to make it easier to verify [13]. Existential abstraction may only add extra 
behaviors, so that when a safety property holds in an abstracted model, it also 
holds in the original model; if on the other hand a safety property fails to hold, 
the model-checking algorithms return a witness trace exhibiting the non-safe 
behaviour: this either invalidates the property on the original model, if the trace 
exists in that model, or gives information about how to automatically refine the 
abstraction. This approach, named CEGAR (counter-example guided abstrac- 
tion refinement) [11], was further developed and used, for instance, in software 
verification (BLAST [20], SLAM [5], ...). 

The CEGAR, approach has been adapted to timed automata, e.g. in [14, 
18], but the abstractions considered there only consist in removing clocks and 
discrete variables, and adding them back during refinement. So for most well- 
designed models, one ends up adding all clocks and variables which renders the 
method useless. Two notable exceptions are [22], in which the zone extrapolation 
operators are dynamically adapted during the exploration, and [29], in which 
zones are refined when needed using interpolants. Both approaches define “exact” 
abstractions in the sense that they make sure that all traces discovered in the 
abstract model are feasible in the concrete model at any time. 

In this work, we consider a more general setting and study predicate abstrac- 
tions on clock variables. Just like in software model checking, we define abstract 
state spaces using these predicates, where the values of the clocks and their 
relations are approximately represented by these predicates. New predicates are 
generated if needed during the refinement step. We instantiate our approach by 
two algorithms. The first one is a zone-based enumerative algorithm inspired by 
the lazy abstraction in software model checking [19], where we assign a possibly 
different abstract domain to each node in the exploration. The second algorithm 
is based on binary decision diagrams (BDD): by exploiting the observation that a 
small number of predicates was often sufficient to prove safety properties, we use 
an efficient BDD encoding of zones similar to one introduced in early work [28]. 

Let us explain the abstract domains we consider. Assume there are two clock 
variables x and y. The abstraction we consider consists in restricting the clock 


a a. 


) Abstraction of zone 1 < x,y < 2 ) Abstraction of zoney<1A1l<a-y<2 


Fig. 1. The abstract domain is defined by the clock constraints shown in thick red 
lines. In each example, the abstraction of the zone shown on the left (shaded area) is 
the larger zone on the right. (Color figure online) 
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constraints that can be used when defining zones. Assume that we only allow to 
compare x with 2 or 3; that y can only be compared with 2, and «—y can only be 
compared with —1 or 2. Then any conjunction of constraints one might obtain 
in this manner will be delimited by the thick red lines in Fig.1; one cannot 
define a finer region under this restriction. The figure shows the abstraction 
process: given a “concrete” zone, its abstraction is the smallest zone which is a 
superset and is definable under our restriction. For instance, the abstraction of 
l<ay<2is0O<a,y<2A-1<a-y (cf. Fig. la). 


Related Works. We give more detail on zone abstractions in timed automata. 
Most efforts in the literature have been concentrated in designing zone abstrac- 
tion operators that are exact in the sense that they preserve the reachability 
relation between the locations of a timed automaton; see [6]. The idea is to 
determine bounds on the constants to which a given clock can be compared to 
in a given part of the automaton, since the clock values do not matter outside 
these bounds. In [21,22], the authors give an algorithm where these bounds are 
dynamically adapted during the exploration, which allows one to obtain coarser 
abstractions. In [29], the exploration tree contains pairs of zones: a concrete zone 
as in the usual algorithm, and a coarser abstract zone. The algorithm explores 
all branches using the coarser zone and immediately refines the abstract zone 
whenever an edge which is disabled in the concrete zone is enabled. In [17], a 
CEGAR loop was used to solve timed games by analyzing strategies computed 
for each abstract game. The abstraction consisted in collapsing locations. 

Some works have adapted the abstraction-refinement paradigm to timed 
automata. In [14], the authors apply “localization reduction” to timed automata 
within an abstraction-refinement loop: they abstract away clocks and discrete 
variables, and only introduce them as they are needed to rule out spurious coun- 
terexamples. A more general but similar approach was developed in [18]. In [31], 
the authors adapt the trace abstraction refinement idea to timed automata where 
a finite automaton is maintained to rule out infeasible edge sequences. 

The CEGAR approach was also used recently in the LinAIG framework for 
verifying linear hybrid automata [1]. In this work, the backward reachability algo- 
rithm exploits don’t-cares to reduce the size of the Boolean circuits representing 
the state space. The abstractions consist in enlarging the size of don’t-cares to 
reduce the number of linear predicates used in the representation. 


2 Timed Automata and Zones 


2.1 Timed Automata 


Given a finite set of clocks C, we call valuations the elements of Ro. For a 
clock valuation v, a subset R C C, and a non-negative real d, we denote with 
v[R — d] the valuation w such that w(x) = v(x) for x € C \ R and w(x) = d for 
x € R, and with v + d the valuation w’ such that w(x) = v(x) + d for all x € C. 
We extend these operations to sets of valuations in the obvious way. We write 0 
for the valuation that assigns 0 to every clock. An atomic guard is a formula of 
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the form z < k or x—y < k with z,y € C, k € N, and < € {<,<,>,>}. A guard 
is a conjunction of atomic guards. A valuation v satisfies a guard g, denoted 
v = g, if all atomic guards hold true when each x € C is replaced with v(x). 
Let [g] = {v € RS, | v H g} denote the set of valuations satisfying g. We write 
@c for the set of guards built on C. 

A timed automaton A is a tuple (£, Inv, €o,C, E), where £ is a finite set of 
locations, Inv: £ — e defines location invariants, C is a finite set of clocks, 
E C Lx Be x 2° x Lisa set of edges, and lọ € £ is the initial location. An edge 


e = (£, g, R, @’) is also written as £ OF, For any location £, we let E(£) denote 
the set of edges leaving £. 

A configuration of A is a pair q = (é,v) € £ x Ro such that v  Inv(é). 
A run of A is a sequence q1e1q2€2 ...qn Where for all i > 1, q; = (4;,v;) is 
a configuration, and either e; € Rso, in which case qi, = (li, vi + ei), Oor 
ei = (G, gi, Ri, li+1) € E, in which case v; = gi and qi4i1 = (6:41, 0;[R; — 0)). 
A path is a sequence of edges with matching endpoint locations. 


2.2 Zones and DBMs 


Several tools for timed automata implement algorithms based on zones, which 
are particular polyhedra definable with clock constraints. Formally, a zone Z is 
a subset of RS, definable by a guard in 8e. 

We recall a few basic operations defined on zones. First, the intersection ZAZ’ 
of two zones Z and Z’ is clearly a zone. Given a zone Z, the set of time-successors 
of Z, defined as Z} = {v +t € RS, | t € Rso, v € Z}, is easily seen to be 


a zone; similarly for time-predecessors Z| = {v € RS, | dt > 0. u+te zZ}. 
Given R C C, we let Resetp(Z) be the zone {v[R — 0] € RS, | v € Z}, and 
Free, (Z) = {v € RS, | w € Z,d € Ryo, v’ = v[z — d]}. 

Zones can be represented as difference-bound matrices (DBM) [8,15]. 
Let Co = C U {0}, where 0 is an extra symbol representing a special clock vari- 
able whose value is always 0. A DBM is a |Co| x |Co|-matrix taking values in 
(Z x {<,<})U{(+00, <)}. Intuitively, cell (x, y) of a DBM M stores a pair (d, <) 
representing an upper bound on the difference x—y. For any DBM M, we let [M] 
denote the zone it defines. 

While several DBMs can represent the same zone, each zone admits a canon- 
ical representation, which is obtained by storing the tightest clock constraints 
defining the zone. This canonical representation can be obtained by comput- 
ing shortest paths in a graph where the vertices are clocks and the edges 
weighted by clock constraints, with natural addition and comparison of elements 
of (Z x {<,<}) U{(+00, <)}. This graph has a negative cycle if, and only if, the 
associated DBM represents the empty zone. 

All the operations on zones can be performed efficiently (in O(|Co|?)) on their 
associated DBMs while maintaining reduced form. For instance, the intersection 
N = ZAN Z' of two canonical DBMs Z and Z’ can be obtained by first com- 
puting the DBM M = min(Z, Z’) such that M(x,y) = min{Z(z,y), Z’(x, y)} 
for all (x,y) € Co”, and then turning M into canonical form. We refer to [8] for 
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full details. By a slight abuse of notation, we use the same notations for DBMs 
as for zones, writing e.g. M’ = Mt, where M and M’ are reduced DBMs such 
that [W] = [M]. Given an edge e = (¢,9,R,¢’), and a zone Z, we define 
Poste(Z) = Inv(é’) N (g N Resetr(Z))T, and Pre.(Z) = (g N Freer(Inv(é’) 9 Z))I. 
For a path p = e1€2...€n, we define Post, and Pre, by iteratively applying 
Poste, and Pree, respectively. 


2.3 Clock-Predicate Abstraction and Interpolation 


For all clocks x and y in Co, we consider a finite set Dz, C Nx {<, <}, and gather 
these in a table D = (Dz,y)x,yec,- D is the abstract domain which restricts zones 
to be defined only using constraints of the form x — y < k with (k, <) € Dry, 
as seen earlier. Let us call D the concrete domain if Dz. = N x {<,<} for 
all x,y € Co. A zone Z is D-definable if there exists a DBM D such that Z = [D] 
and D(x,y) E€ Dzy for all x,y € Co. Note that we do not require this witness 
DBM D to be reduced; the reduction of such a DBM might introduce additional 
values. We say that domain D’ is a refinement of D if for all x,y € Co, we have 
Day E Doy 

An abstract domain D induces an abstraction function ap: So _, 
2RSo where ap(Z) is the smallest D-definable zone containing Z. For any 
reduced DBM D, ap([D]) can be computed by setting D'(x,y) = min{(k, <) 
€ Dzy | D(x,y) < (k, <)} (with min = (œ, <)). 

An interpolant for a pair of zones (Z1, Z2) with Zı N Z2 = @ is a zone Z3 
with Zı C Z and Zs N Za = Ø! [29]. We use interpolants to refine our 
abstractions; in order not to add too many new constraints when refining, 
our aim is to find minimal interpolants: define the density of a DBM D as 
d(D) = #{(x,y) € Co? | D(z, y) # (co, <)}. Notice that while any pair of dis- 
joint convex polyhedra can be separated by hyperplanes, not all pairs of disjoint 
zones admit interpolants of density 1; this is because not all (half-spaces delim- 
ited by) hyperplanes are zones. Still, we can bound the density of a minimal 
interpolant: 


Lemma 1. For any pair of disjoint, non-empty zones (A, B), there exists an 
interpolant of density less than or equal to |Co|/2. 


By adapting the algorithm of [29] for computing interpolants, we can compute 
minimal interpolants efficiently: 


Proposition 2. Computing a minimal interpolant can be performed in O(\C|*). 


3 Enumerative Algorithm 


The first type of algorithm we present is a zone-based enumerative algorithm 
based on the clock-predicate abstractions. Let us first describe the overall 


1 It is sometimes also required that the interpolant only involves clocks that have 
non-trivial constraints in both Zı and Z2. We do not impose this requirement in our 
definition, but it will hold true in the interpolants computed by our algorithm. 
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algorithm in Algorithm 1, which is a typical abstraction-refinement loop. We then 
explain how the abstract reachability and refinement procedures are instantiated. 


Algorithm 1. Enumerative Algorithm 2. AbsReach 

algorithm checking the reacha- Input: (£, Inv, lo,C, E), wait, passed, 

bility of a target location £r. Lr 
— input: A=(LInv.&,C,B) tp 1 while wait # 0 do 

1 Initialize Do; 2 n := wait.pop(); 

2 wait:= {node(o, 0f, Do)}; 3 if n.l = lr then 

3 passed:= Í; 4 | return Trace from root to n; 

4 while do 5 if In’ € passed such that n.l = 

5 am := AbsReach(A, wait, n!’ LAnZ C n'.Z then 

passed, £r); 6 n.covered := n’; 

6 if m = Ú then 7 else 

7 | return Not reachable; 8 n.Z := a(n.Z,n); 

8 else 9 passed.add(n); 

9 if trace m is feasible then 10 for e = (l, g, R, U) € E(n.£) 
10 | return Reachable; s.t. Z' := Poste(n.Z) #0 
11 else do 

| Refine(z, wait, passed); 11 D' := choose-dom(n, e); 
L 12 n’ := node(l', Z',D'); 
12 return Not reachable; 13 n’ parent := n; 
Å A wait.add(n’); 


15 return Í; 


The initialization at line 1 chooses an abstract domain for the initial state, 
which can be either empty (thus the coarsest abstraction) or defined according 
to some heuristics. The algorithm maintains the wait and passed lists that are 
used in the forward exploration. As usual, the wait list can be implemented 
as a stack, a queue, or another priority list that determines the search order. 
The algorithm also uses covering nodes. Indeed if there are two node n and 
n', with n € passed, n’ € wait, n.l = n’.l, and n’.z C n.Z, then we know 
that every location reachable from n’ is also reachable from n. Since we have 
already explored n and we generated its successors, there is no need to explore 
the successors of n’. The algorithm explicitly creates an exploration tree: line 2 
creates a node containing location lo, zone OJ, and the abstract domain Do as the 
root of our tree, and adds this to the wait list. More details on the tree are given 
in the next subsection. Procedure AbsReach then looks for a trace to the target 
location £r. If such a trace exists, line 9 checks its feasibility. Here 7 is a sequence 
of node and edges of A. The feasibility check is done by computing predecessors 
with zones starting from the final state, without using the abstraction function. 
If the last zone intersects our initial zone, this means that the trace is feasible. 
More details are given in Sect. 3.2. 
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3.1 Abstract Forward Reachability: AbsReach 


We give a generic algorithm independently from the implementations of the 
abstraction functions and the refinement procedure. 

Algorithm 2 describes the reachability procedure under a given abstract 
domain D. It is similar to the standard forward reachability algorithm using 
a wait-list and a passed-list. We explicitly create an exploration tree where the 
leaves are nodes in wait, covered nodes, or nodes that have no non-empty succes- 
sors. Each node n contains the fields £, Z which are labels describing the current 
location and zone; field covered points to a node covering the current node (it is 
undefined if the current node is not (known to be) covered); field parent points 
to the parent node in the tree (it is undefined for the root); and field D is the 
abstract domain associated with the node. Thus, the algorithm uses a possibly 
different abstract domain for each node in the exploration tree. 

The difference of our algorithm w.r.t. the standard reachability can be seen 
at lines 8 and 11. At line 8, we apply the abstraction function to the zone taken 
from the wait-list before adding it to the passed-list. The abstraction function a 
is a function of a zone Z and a node n. This allows one to define variants with 
different dependencies; for instance, a might depend on the abstract domain n.D 
at the current node, but it can also use other information available in n or on 
the path ending in n. For now, it is best to think of a simply as Z +> an.p(Z). 
At line 11, the function choose-dom chooses an abstract domain for the node n’. 
The domain could be chosen global for all nodes, or local to each node. A good 
trade-off, which we used in our experiments, is to have domains associated with 
locations of the timed automaton. 


Remark 1. Note that we use the abstraction function when the node is inserted 
in the passed list. This is because we want the node to contain the smallest zone 
possible when we test whether the node is covered. We only need to use the 
abstracted zone when we compute its successor and when we test whether the 
node is covering. This allows us to store a unique zone. 


As a first step towards proving correctness of our algorithm, we show that 
the following property is preserved by Algorithm AbsReach: 


For all nodes n in passed, for all edges e from n.é, if Post.(n.Z) Æ 0, 
then n has a child n’ such that Post.(n.Z) C n’.Z. If n’ is in passed, (1) 
then we also have aw .p(Poste(n.Z)) C n’.Z. 


Lemma 3. Algorithm AbsReach preserves Property (1). 


Note that although we use inclusion in Property (1), AbsReach would actually 
preserve equality of zones, but we will not always have equality before running 
AbsReach. This is because Refine might change the zones of some nodes without 
updating the zones of all their descendants. 
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3.2 Refinement: Refine 


We now describe our refinement procedure Refine. Let us now assume that 
AbsReach returns t = A, 24 Ay Z ... Æ, Ay, and write D; for the 
domain associated with each A;. We write Cı for the initial concrete zone, and 
for i < k, we define C4; = Post,,(A;). We also note Z, = A, and for i < k, 
Zi = Pres; (Zi+1) O Aj. Then 7 is not feasible if, and only if, Poste, ...o, (C1) = 9, 
or equivalently Preg,...¢,(Ax) A C1 = Q. Since for all i < k, it holds C; C Ai4s, 
we have that 7 is not feasible if, and only if, Ji < k. Ci N Zi = 0. We illustrate 
this on Fig. 2. 


Post ~~ Post 


Fig. 2. Spurious counter-example: Zı N Ci = 0 


Let us assume that 7 is not feasible. Let us denote by 79 the maximal index 
such that Ci, N Zio = Ø. This index also has the property that for all j < io, 
we have Z; = Ø and Zi, 4 Ø. Once we have identified this trace as spurious by 
computing the Z;, we have two possibilities: 


~ if Zio Nap,, (Cio) # : this means that we can reach Ax from ap, (Cio) but 
not from C;,. In other words, our abstraction is too coarse and we must add 
some values to Dj, so that Zio N ap,, (Cio) = 0. Those values are found by 
computing the interpolant of Zi, and Cis 

— Otherwise it means that ap,, (Ci) cannot reach A; and the only reason the 
trace exists is because either D;, or A;,-1 has been modified at some point 
and A;, was not modified accordingly. 


We can then update the values of C; for i > ig and repeat the process until 
we reach an index jo such that Cj, = Ø. We then have modified the nodes 
Nios- --; Nja and knowing that n,;,.Z = 0, we can delete it and all of its descen- 
dants. Since some of the descendants of n;, have not been modified, this might 
cause some refinements of the first type in the future. In order to ensure termi- 
nation, we sometimes have to cut a subtree from a node in njy,...,j)-1 and 
reinsert it in the wait list to restart the exploration from there. We call this 
action cut, and we can use several heuristics to decide when to use it. In the 
rest of this paper we will use the following heuristics: we perform cut on the first 
node of nj,...nj;. that is covered by some other node. Since this node is covered, 
we know that we will not restart the exploration from this node, or that the 
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node was covered by one of its descendant. If none of these nodes are covered, 
we delete nj and its descendants. Other heuristics are possible, for instance 
applying cut on ni. We found that the above heuristics was the most efficient 
in our experiments. 


Lemma 4. Pick a node n, and let Y = n.Z. Then after running Refine, either 
node n is deleted, or it holds n.Z C Y. In other words, the zone of a node can 
only be reduced by Refine. 


It follows that Refine also preserves Property (1), so that: 
Lemma 5. Algorithm 1 satisfies Property (1). 


We can then prove that our algorithm correctly decides the reachability prob- 
lem and always terminates. 


Theorem 6. Algorithm 1 terminates and is correct. 


4 Symbolic Algorithm 


4.1 Boolean Encoding of Zones 


We now present a symbolic algorithm that represents abstract states using 
Boolean formulas. Let B = {0,1}, and V be a set of variables. A Boolean for- 
mula f that uses variables from set X C V will be written f(X) to make the 
dependency explicit; we sometimes write f(X,Y) in place of f(X UY). Such a 
formula represents a set [f] = {v € BY | v H f}. We consider primed versions 
of all variables; this will allow us to write formulas relating two valuations. For 
any subset X C V, we define X’ = {p’ | p E€ X}. 

A literal is either p or ~p for a variable p. Given a set X of variables, an X- 
minterm is the conjunction of literals where each variable of X appears exactly 
once. X-minterms can be seen as elements of B*. Given a vector of Boolean 
formulas Y = (Yz)zex, formula f[Y/X] is the substitution of X by Y in f, 
obtained by replacing each x € X with the formula Y,. The positive cofactor 
of f(X) by x is da. (x A f(X)), and its negative cofactor is dx. (nx A f(X)). 

Let us define a generic operator post that computes successors of a 
set S(X,Y) given a relation R(X, X’) (here, Y designates any set of variables 
on which S might depend outside of X): postp(S(X,Y)) = (AX.S(X,Y) A 
R(X, X’))[X/X"]. Similarly, we set prep(S(X,Y)) = (AX'.S(X,Y)[X'/X] A 
R(X, X')), which computes the predecessors of S(X,Y) by the relation R [24]. 


“=~ 


Clock Predicate Abstraction. We fix a total order < on Co. In this section, abstract 
domains are defined as D = (Dz.y)zayec,, that is only for pairs x < y. In fact, 
constraints of the form x — y < k with x > y are encoded using the negation of 
y— zx < —k since (x — y < k) & 7(y—a < —k). We thus define Dry = —Dy,x 
for all x > y. 
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For x,y € Co, let Pz, denote the set of clock predicates associated to Dg y: 


Pea = {Pr-y<k | (k, <) € Duy he 


Let PP = UzyecyPx,y denote the set of all clock predicates associated 
with D (we may omit the superscript D when it is clear). For all (x,y) € 
Co” and (k,<)€ Dzy, we denote by Pr-y<k the literal P,y2, if x < y, 
and =P,y_,<-1_, otherwise (where <~' = < and <~* = <). We also consider a 
set 6 of Boolean variables used to encode locations. Overall, the state space is 
described using Boolean formulas on these two types of variables, so states are 
elements of B?Y8. 

Our Boolean encoding of clock constraints and semantic operations follow 
those of [28] for a concrete domain. We define these however for abstract domains, 
and show how successor computation and refinement operations can be per- 
formed. 

Let us define the clock semantics of predicate P,y<x as [Pr—y<k]c, = 
{v € RẸ | v(x) — v(y) < k}. Since the set C of clocks is fixed, we may omit 
the subscript and just write [P,—,<,]. We define the conjunction, disjunction, 
and negation as intersection, union, and complement, respectively. Given a P- 
minterm v € B”, we define [uy] > = Mp s.t. vp PID AN p st. vp) Plb- Thus, nega- 
tion of a predicate encodes its complement. For a Boolean formula F(P), we set 
[F] = Ureminterms(r)lv]o. Intuitively, the minterms of P define smallest zones 
of RS, definable using P. A minterm v € BY? defines a pair [v]p = (I, Z) 
where l is encoded by vg and Z = [up]p. A Boolean formula F on B U P 
defines a set [F] D = Uveminterms(r)[v]p of such pairs. A minterm v is satisfiable 
if [v]p # 0. 

An abstract domain D induces an abstraction function ap: 2RSo _, 2B” 
with ap(Z) = {v | v € BP and [v]nN Z F Ø}, from the set of zones to the 
set of subsets of Boolean valuations on P. We define the concretization function 


P Cc 
as [:]p: 22 — 2™50. The pair (ap, [-]p) is a Galois connection, and [ap(Z)]p is 
the most precise abstraction of Z in the domain induced by D. Notice that ap is 
non-convex in general: for instance, if the clock predicates are x < 2,y < 2, then 
the set defined by the constraint x = y maps to (pr<2 ^ Py<2) V (AP2<2 A APy<2). 


4.2 Reduction and Successor Computation 


We now define the reduction operation, which is similar to the reduction of 
DBMs. The idea is to eliminate unsatisfiable minterms from a given Boolean 
formula. For example, we would like to make sure that in all minterms, if pz_y<1 
holds, then so does pz—y<2, when both are available predicates. Another issue is 
to eliminate minterms that are unsatisfiable due to triangle inequality. This is 
similar to the shortest path computation used to turn DBMs in canonical form. 


Example 1. Given predicates P = {pz—y<1,Py—z<1,Px—z<2}, the formula 
Pr—y<1 ^ Py—z<1 is not reduced since it contains the unsatisfiable minterm 
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Pa—y<1 ^ Py—z<1 ^ “Pz—z<2. However, the same formula is reduced if P = 
{Dis <i Py-z<1}- 

In this paper, we use limited reduction, since reductions are the most expen- 
sive operations in our algorithms. The following formula corresponds to 2- 


reduction, which intuitively amounts to applying shortest paths for paths of 
lengths 1 and 2: 


VAN erat p ( V Pr—y<xılı M Vy Pr—-z<l srr) 


(x,y) €Co? (11,<1)€Da,y z€Co,(l1,<1)€Da,z; 
(k,X)EDz,y (l1; <1) <(k, <) (lg,<2)EDz,y 
(11, <1) +(l2,<2)<(k,<) 


Lemma 7. For all formulas S(P), we have [S]p = [reduce}(S)]p and all 
minterms of reduce? (S) are 2-reduced. 

Since 2-reduction des not consider shortest paths of all lengths, there are, in 
general, 2-reduced unsatisfiable minterms. Nevertheless, any abstraction can be 
refined so that the updated 2-reduction eliminates a given unsatisfiable minterm: 


BP” 


Lemma 8. Letv € be a minterm such that v = reducez, and |v] = 0. 
One can compute in polynomial time a refinement D' D D such that v /K 
reducezy,. 


We now explain how successor computation is realized in our encoding. For a 
guard g, assume we have computed an abstraction ap(g) in the present abstract 
domain. For each transition o = (1,9, R, £2), let us define the formula T, = 
Lı Aap(g). We show how each basic operation on zones can be computed in our 
BDD encoding. In our algorithm, all formulas A(B, P) representing sets of states 
are assumed to be reduced, that is, A(B, P) C reduce} (A(B, P)). 

The intersection operation is simply logical conjunction: 


Lemma 9. For all reduced formulas A(P) and B(P), we have A(P) A B(P) = 
ap([A(P)]p 9 [B(P)]»). 

For the time successors, we define Up(A(B,P)) = reduce(post,, (A(B, P))) 
where 


1 1 
Sup = A (“Pr-0<k > “Pr—ok) A (Pe-yzk © Pr-yxk). 
zec z, yYECo 240 
(k,<)EDz,0 (k,<)EDa,y 


Lemma 10. For any Boolean formula A(B, P), ap([A]t) E Up(A). Moreover, 
if D is the concrete domain and A is reduced, then this holds with equality. 
Following similar ideas, we handle clock resets by defining Reset,(A) = 
reduce(post,y.  (A)), for a (complex) relation SReset, to encode how predicates 
evolve (see the long version [27] of this article for more detailled explanations). 
We get: 
Lemma 11. For any Boolean formula A(B,P), and any clock z € C, we have 
ap(Reset,([A]p)) C Reset.(A). Moreover, if D is the concrete domain, and A 
is reduced, then the above holds with equality. 
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Algorithm 3. Algorithm SymReach that checks the reachability of a target 
location lr in a given abstract domain D. 
Input: A = (L£, Inv, €0,C, E), £r, D 
next := enc(lo) A ap(Azecu = 0); 
layers := |]; 
reachable := false; 
while (—reachable ^ next) #4 false do 
reachable := reachable V next; 
next := ApplyEdges(Up(next)) A sreachable; 
layers.push(next); 
if (next A enc(lr)) A false then 
| return ExtractTrace (layers); 


OMAN OAK WN 


m 
(=] 


m 
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return Not reachable; 


4.3 Model-Checking Algorithm 


Algorithm 3 shows how to check the reachability of a target location given an 
abstract domain. The list layers contains, at position 7, the set of states that 
are reachable in 7 steps. The function ApplyEdges computes the disjunction of 
immediate successors by all edges. It consists in looping over all edges e = 
(L,g, R,l2), and gathering the following image by e: 


enc(£2) A Reset,, (Reset,,,_, (... (Reset, ((((3B.A(B, P) A enc(41)) A ap(g))))))), 


where R = {r1,..., rg}. We thus use a partitioned transition relation and do not 
compute the monolithic transition relation. 

When the target location is found to be reachable, ExtractTrace(layers) 
returns a trace reaching the target location. This is standard and can be done by 
computing backwards from the last element of layers, by finding which edge can 
be applied to reach the current state. Since both reset and time successor opera- 
tions are defined using relations, predecessors in our abstract system can be easily 
computed using the operator prep. As it is standard, we omit the precise defini- 
tion of this function (the reader can refer to the implementation) but assume that 
it returns a trace of the form A; 2 Ay 2+... Z5 An, where the A;(B, P) 
are minterms and the g; belong to the trace alphabet X = {up, rø} U {r(x)}zec, 
with the following meaning: 


— if A; ita i+1 then AÅi+1 = Up(A;); 
— if A; =, 441 then Aja = Ai; 
— if A; 29, iti then Ai+i = Reset, (A;). 


The feasibility of such a trace is easily checked using DBMs. 
The overall algorithm then follows a classical CEGAR scheme. We initialize D 
by adding the clock constraints that appear syntactically in A, which is often 
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a good heuristic. We run the reachability check of Algorithm 3. If no trace is 
found, then the target location is not reachable. If a trace is found, then we check 
for feasibility. If it is feasible, then the counterexample is confirmed. Otherwise, 
the trace is spurious and we run the refinement procedure described in the next 
subsection, and repeat the analysis. 


4.4 Abstraction Refinement 


Since we initialize D with all clock constraints appearing in guards, we can 
assume that all guards are represented exactly in the considered abstractions. 
Note that the algorithm can be easily extended to the general case; but this 
simplifies the presentation. 

The abstract transition relation we use is not the most precise abstraction of 
the concrete transition relation. Therefore, it is possible to have abstract tran- 
sitions A, = A» for some action a while no concrete transition exists between 
[Ai] and [A2]. This requires care and is not a direct application of the standard 
refinement technique from [11]. A second difficulty is due to incomplete reduction 
of the predicates using reducez,. In fact, some reachable states in our abstract 
model will be unsatisfiable. Let us explain how we refine the abstraction in each 
of these cases. 

Consider an algorithm interp which returns an interpolant of given 
zones Z1, Z2. In what follows, by the refinement of D by interp(Z1, Z2), we mean 
the domain D’ obtained by adding (k, <) to Dy, for all constraints £ — y < k 
of interp( Z1, Z2). Observe that ap (Z1) N ap: (Z2) = Í in this case. 

We define concrete successor and predecessor operations for the actions in X. 
For each a € X, let Pref denote the concrete predecessor operation on zones 
defined straightforwardly, and similarly for Post®. 

Consider domain D and the induced abstraction function ap. Assume that 
we are given a spurious trace T = A, 25 A, 24... = An. Let Bi... Bn be 
the sequence of concrete states visited along m in A, that is, Bı is the concrete 
initial state, and for all 2 <i < n, let B; = Post, ,(Bj-1). This sequence can 
be computed using DBMs. 

The trace is realizable if B, 4 9, in which case the counterexample is con- 
firmed. Otherwise it is spurious. We show how to refine the abstraction to elim- 
inate a spurious trace 7. 

Let io be the maximal index such that Bi, # Ø. There are three possible 
reasons explaining why B,,41 is empty: 


1. first, if the abstract successor A;,41 is unsatisfiable, that is, if it contains 
contradictory predicates; in this case, Ai,+1] = Ø, and the abstraction is 
refined by Lemma 8 to eliminate this case by strengthening red uce’,, 

2. if there are predecessors of A;,+41 inside A;, but none of them are in Bi, i.e., 
Prez, ({Ai,+1]) A [Ai] 4 0; in this case, we refine the domain by separating 
these predecessors from the rest of A;, using interp(Pre;,, ({Ai,+1]), Bio-1), 
as in [11]. 
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3. otherwise, there are no predecessors of A;,+1 inside A;,: we refine the abstrac- 
tion according to the type of the transition from step tio to io + 1: 
(a) if Tip = up: refine D by interp([A;,]1, [Aio+i])- 
(b) if mi = r(x): refine D by interp(Free,([Ai,]), Frees ([Aio+1]))- 


Note that the case mi = rg is not possible since this induces the identity 
function both in the abstract and concrete systems. 

Given abstraction ap and spurious trace 7, let refine(a@p,7) denote the 
refined abstraction ap: obtained as described above. 

The following two lemmas justify the two subcases of the third case above. 
They prove that the detected spurious transition disappears after refinement. 
The reset and up operations depend on the abstraction, so we make this depen- 
dence explicit below by using superscripts, as in Reset% and Up“, in order to 
distinguish the operations before and after a refinement. 


Lemma 12. Consider (Ai, A2) E€ Up“ with [Ai]? N [A2] = 0. Then [Ai]t N 
[Aa] = 9. Moreover, if ao! is obtained by refinement of a by interp([Ai]7, [A2]}), 
then for all (A1, A3) € Up® , [A1] € [Ai] implies [AS] A [A2] = 0. 


A Ag 


(a) Refinement for the time successors (b) Refinement for the reset opera- 
operation. The interpolant that sepa- tion. The interpolant that separates 
rates [A1 ]f from [Az] contains the con- Free, (A1) from Free, (A2) contains the 
straint x = y + 2. When this is added to constraint x < 2. When this is added to 
the abstract domain, the set A% (which is the abstract domain, the set A% (which 
A2 in the new abstraction) is no longer is Az in the new abstraction) is no 
reachable by the time successors opera- longer reachable by the reset operation. 
tion. 


Lemma 13. Consider x € C, and (A1, A2) € Reset% such that |A ]|£z — 0] A 
[A2] = Ø. Then Free,([Ai]) O Freez([A2]) = Ø. Moreover, if a’ is obtained 
by refinement of a by interp(Free,([A1]), Free, ([A2])), then for all (A1, AS) € 
Reset% with [A] C [Ai], we have [AS] A [A2] = 0. 


5 Experiments 


We implemented both algorithms. The symbolic version was implemented in 
OCaml using the CUDD library’; the explicit version was implemented in C++ 
within an existing model checker using Uppaal DBM library. Both prototypes 


? http://visi.colorado.edu/~fabio/. 
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take as input networks of timed automata with invariants, discrete variables, 
urgent and committed locations. The presented algorithms are adapted to these 
features without difficulty. 

We evaluated our algorithms on three classes of benchmarks we believe are 
significant. We compare the performance of the algorithm with that of Uppaal [7] 
which is based on zones, as well as the BDD-based model checker engine of 
PAT [25]. We were unable to compare with RED [30] which is not maintained 
anymore and not open source, and with which we failed to obtain correct results. 
The tool used in [16] was not available either. We thus only provide a comparison 
here with two well-maintained tools. 

Two of our benchmarks are variants of schedulability-analysis problems 
where task execution times depend on the internal states of executed processes, 
so that an analysis of the state space is necessary to obtain a precise answer. 


Monoprocess Scheduling Analysis. In this variant, a single process sequen- 
tially executes tasks on a single machine, and the execution time of each cycle 
depends on the state of the process. The goal is to determine a bound on the 
maximum execution time of a single cycle. This depends on the semantics of the 
process since the bound depends on the reachable states. 

More precisely, we built a set of benchmarks where the processes are defined 
by synchronous circuit models taken from the Synthesis Competition (http:// 
www.syntcomp.org). We assume that each latch of the circuit is associated with 
a resource, and changing the state of the resource takes some amount of time. 
So a subset of the latches have clocks associated with them, which measure 
the time elapsed since the latest value change (latest moment when the value 
changed from 0 to 1, or from 1 to 0). We provide two time positive bounds £o 
and ¢, for each latch, which determine the execution time as follows: if the value 
of latch £ changes from 0 to 1 (resp. from 1 to 0), then the execution time of the 
present cycle cannot be less than £, (resp. o). The execution time of the step is 
then the minimum that satisfies these constraints. 


Multi-process Stateful Scheduling Analysis. In this variant, three processes 
are scheduled on two machines with a round-robin policy. Processes schedule 
tasks one after the other without any delay. As in the previous benchmarks, 
a process executing a task (on any machine) corresponds to a step of the syn- 
chronous circuit model. Each task is described by a tuple (C,,C2,D) which 
defines the minimum and maximum execution times, and the relative deadline. 
When a task finishes, the next task arrives immediately. The values in the tuple 
depend on the state of the process. The goal is to check the absence of any dead- 
line miss. Processes are also instantiated with AIG circuits from http://www. 
syntcomp.org. 


Asynchronous Computation. We consider an asynchronous network of 
“threshold gates”, defined as follows: each gate is characterized by a tuple 
(n, 6, |l, u]) where n is the number of inputs, 0 < 0 < nis the threshold, and | < u 
are lower and upper bounds on activation time. Each gate has an output which 
is initially undefined. The gate becomes active during the time period [I, ul. 
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During this time, if all inputs are defined, and if at least 6 of the inputs have 
value 1, then it sets its output to 1. At the end of the time period, it becomes deac- 
tivated and the output becomes undefined again, until the next period, which 
starts | time units after the deactivation. The goal is to check whether the given 
gate can output 1 within a given time bound T. 


Results. Figure 3 displays the results of our experiments. All algorithms were 
given 8GB of memory and a timeout of 30min, and the experiments were run 
on laptop with an Intel i7@3.2 Ghz processor running Linux. The symbolic algo- 
rithm performs best among all on the monoprocess and multiprocess scheduling 
benchmarks. Uppaal is the second best, but does not solve as many benchmarks 
as our algorithm. Our enumerative algorithm quickly fails on these benchmarks, 
often running out of memory. On asynchronous computation benchmarks, our 
enumerative algorithm performs remarkably well, beating all other algorithms. 
We ran our tools on the CSMA/CD benchmarks (with 3 to 12 processes); Uppaal 
performs the best but our enumerative algorithm is slightly behind. The symbolic 
algorithm does not scale, while PAT fails to terminate in all cases. 

The tool used for the symbolic algorithm is open source and can be found at 
https://github.com/osankur/symrob along with all the benchmarks. 
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Fig. 3. Comparison of our enumerative and symbolic algorithms (referred to as Abs- 
enumerative and Abs-symbolic) with Uppaal and PAT. Each figure is a cactus plot for 
the set of benchmarks: a point (X,Y) means X benchmarks were solved within time 
bound Y. 
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6 Conclusion and Future Work 


There are several ways to improve the algorithm. Since the choice of interpolants 
determines the abstraction function and the number of refinements, we assumed 
that taking the minimal interpolant should be preferable as it should keep the 
abstractions as coarse as possible. But it might be better to predict which inter- 
polant is the most adapted for the rest of the computation in order to limit 
future refinements. The number of refinement also depends on the search order, 
and although it has already been studied in [23], it could be interesting to study 
it in this case. Generally speaking, it is worth noting that we currently cannot 
predict which (variant of) our algorithms is better suited for which model. 

Several extensions of our algorithms could be developed, e.g. combining our 
algorithms with other methods based on finer abstractions as in [22], integrating 
predicate abstraction on discrete variables, or developing SAT-based versions of 
our algorithms. 
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Abstract. A popular method for solving reachability in timed automata 
proceeds by enumerating reachable sets of valuations represented as 
zones. A naive enumeration of zones does not terminate. Various ter- 
mination mechanisms have been studied over the years. Coming up with 
efficient termination mechanisms has been remarkably more challenging 
when the automaton has diagonal constraints in guards. 

In this paper, we propose a new termination mechanism for timed 
automata with diagonal constraints based on a new simulation relation 
between zones. Experiments with an implementation of this simulation 
show significant gains over existing methods. 
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1 Introduction 


Timed automata have emerged as a popular model for systems with real-time 
constraints [2]. Timed automata are finite automata extended with real-valued 
variables called clocks. All clocks are assumed to start at 0, and increase at the 
same rate. Transitions of the automaton can make use of these clocks to disallow 
behaviours which violate timing constraints. This is achieved by making use of 
guards which are constraints of the form x < 5, £ — y È 3, y > 7, etc. where x, y 
are clocks. A transition guarded by x < 5 says that it can be fired only when 
the value of clock x is < 5. Another important feature is the reset of clocks in 
transitions. Each transition can specify a subset of clocks whose values become 
0 once the transition is fired. The combination of guards and resets allows to 
track timing distance between events. A basic question that forms the core of 
timed automata technology is reachability: given a timed automaton, does there 
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exist an execution from its initial state to a final state. This question is known 
to be decidable [2]. Various algorithms for this problem have been studied over 
the years and have been implemented in tools [6,21, 26, 28, 31,32]. 

Since the clocks are real valued variables, the space of configurations of a 
timed automaton (consisting of a state and a valuation of the clocks) is infinite 
and an explicit enumeration is not possible. The earliest solution to reachability 
was to partition this space into a finite number of regions and build a region 
graph that provides a finite abstraction of the behaviour of the timed automa- 
ton [2]. However, this solution was not practical. Subsequent works introduced 
the use of zones [14]. Zones are special sets of clock valuations with efficient 
data structures and manipulation algorithms [6]. Within zone based algorithms, 
there is a division: forward analysis versus backward analysis. The current indus- 
try strength tool UPPAAL [28] implements a forward analysis approach, as this 
works better in the presence of other discrete data structures used in UPPAAL 
models [9]. We focus on this forward analysis approach using zones in this paper. 

The forward analysis of a timed automaton essentially enumerates sets of 
reachable configurations stored as zones. Some extra care needs to be taken 
for this enumeration to terminate. Traditional development of timed automata 
made use of extrapolation operators over zones to ensure termination. These are 
functions which map a zone to a bigger zone. Importantly, the range of these 
functions is finite. The goal was to come up with extrapolation operators which 
are sound: adding these extra valuations should not lead to new behaviours. 
This is where the role of simulations between configurations was studied and 
extrapolation operators based on such simulations were devised [14]. A certain 
extrapolation operation, which is now known as Extrajy [5] was proposed and 
reachability using Extrayy was implemented in tools [14]. 

A seminal paper by Bouyer [9] revealed that Extrajy is not correct in the 
presence of diagonal constraints in guards. These are constraints of the form 
x — y <c where < is either < or <, and c is an integer. Moreover, it was proved 
that no such extrapolation operation would be correct when there are diago- 
nal constraints present. It was shown that for automata without diagonal con- 
straints (henceforth referred to as diagonal-free automata), the extrapolation 
works. After this result, developments in timed automata reachability focussed 
on the class of diagonal-free automata [4,5, 23,24], and diagonal constraints were 
mostly sidelined. All these developments have led to quite efficient algorithms 
for diagonal-free timed automata. 

Diagonal constraints are a useful modeling feature and occur naturally in 
certain problems, especially scheduling [3, 17, 20,27] and logic-automata transla- 
tions [16,25], also in [29]. It is however known that they do not add any expres- 
sive power: every timed automaton can be converted into a diagonal-free timed 
automaton [7]. This conversion suffers from an exponential blowup, which was 
later shown to be unavoidable: diagonal constraints could potentially give expo- 
nentially more succinct models [10]. Therefore, a good forward analysis algorithm 
that works directly on a timed automaton with diagonal constraints would be 
handy. This is the subject of this paper. 
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Related Work. The first attempt at such an algorithm was to split the (extrap- 
olated) zones with respect to the diagonal constraints present in the automa- 
ton [6]. This gave a correct procedure, but since zones are split, an enumeration 
starts from each small zone leading to an exponential blow-up in the number 
of visited zones. A second attempt was to do a more refined conversion into a 
diagonal free automaton by detecting “relevant” diagonals [13,30] in an iterative 
manner. In order to do this, special data structures storing sets of sets of diagonal 
constraints were utilized. In [18] we extended the works [5] and [23] on diagonal- 
free automata to the case of diagonal constraints. All the approaches suffer from 
either a space or time bottleneck and are incomparable to the efficiency and 
scalability of tools for diagonal-free automata. 


Our Contributions. The goal of this paper is to come up with fast algorithms for 
handling diagonal constraints. Since the extrapolation based approach is a dead 
end, we work with simulation between zones directly, as in [23] and [18]. We 
propose a new simulation relation between zones that is correct in the presence 
of diagonal constraints (Sect.3). We give an algorithm to test this simulation 
between zones (Sect. 4). We have incorporated this simulation test in (an older 
version of) the tool TChecker [21] checking reachability for timed automata, and 
compared our results with the state-of-the-art tool UPPAAL. Experiments show 
an encouraging gain, both in the number of zones enumerated and in the time 
taken by the algorithm, sometimes upto four orders of magnitude (Sect. 6). The 
main advantage of our approach is that it does not split zones, and furthermore 
it leverages the optimizations studied for diagonal-free automata. 

From a technical point of view, our presentation does not make use of regions 
and instead works with valuations, zones and simulation relations. We think 
that this presentation provides a clearer perspective - as a justification of this 
claim, we extend our simulation to timed automata with general updates of 
the form a := c and x := y + d in transitions (where x,y are clocks and c,d 
are constants) in a rather natural manner (Sect.5). In general, reachability for 
timed automata with updates is undecidable [12]. Some decidable cases have 
been proposed for which the algorithms are based on regions. For decidable 
subclasses containing diagonal constraints, no zone based approach has been 
studied. Our proposed method includes these classes, and also benefits from 
zones and standard optimizations studied for diagonal-free automata. 

Missing proofs can be found in the full version of this paper [19]. 


2 Preliminaries 


Let N be the set of natural numbers, R>o the set of non-negative reals and Z the 
set of integers. Let X be a finite set of variables ranging over R>o, called clocks. 
Let P(X) denote the set of constraints y formed using the following grammar: 
pi=usc | caz | c-ysid | pAg, wherez,ye X,cEN, dEZ 
and < € {<, <}. Constraints of the form z < c and c < x are called non-diagonal 
constraints and those of the form x — y < c are called diagonal constraints. We 
have adopted a convention that in non-diagonal constraints x < c and c < x, the 
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constant c is restricted to N. A clock valuation v is a function which maps every 
clock x € X to a real number v(x) € R>o. A valuation is said to satisfy a guard 
g, written as v = g if replacing every x in g with v(x) makes the constraint 
g true. For ô € Rso we write v + ô for the valuation which maps every x to 
v(x) +6. Given a subset of clocks R C X, we write [R]v for the valuation which 
maps each x € R to 0 and each z ¢ R to v(x). 

A timed automaton A is a tuple (Q,X,qo,T,F) where Q is a finite set of 
states, X is a finite set of clocks, go € Q is the initial state, F C Q is a set 
of accepting states and T € Q x (X) x 2* x Q is a set of transitions. Each 
transition t € T is of the form (q,g,R,q’') where q and q’ are respectively the 
source and target states, g is a constraint called the guard, and R is a set of 
clocks which are reset in t. We call a timed automaton diagonal-free if guards 
in transitions do not use diagonal constraints. 

A configuration of A is a pair (q,v) where q E€ Q and v is a valuation. The 
semantics of a timed automaton is given by a transition system S4 whose states 
are the configurations of A. Transitions in S4 are of two kinds: delay transitions 


are given by (q, v) 2, (q, v + ô) for all ô > 0, and action transitions are given by 


(q, v) 4, (q’,v’) for each t := (q, g, R, q’), if v E g and v’ = [R]v. We write Žt, for 
a sequence of delay ô followed by action t. A run of A is an alternating sequence of 


delay-action transitions starting from the initial state go and the initial valuation 


0 which maps every clock to 0: (qo, 0) Raito; (q1, v1) alee (dn, Un). A run of 


the above form is said to be accepting if the last state qn € F. The reachability 
problem for timed automata is the following: given an automaton A, decide if 
there exists an accepting run. This problem is known to be PSPACE-complete [2]. 
Since the semantics S4 is infinite, solutions to the reachability problem work with 
a finite abstraction of S4 that is sound and complete. Before we explain one of 
the popular solutions to reachability, we state a result which allows to convert 
every timed automaton into a diagonal-free timed automaton. 


Theorem 1. /7/ For every timed automaton A, there exists a diagonal-free 
timed automaton Aas s.t. there is a bijection between runs of A and Aa. The 
number of states in Aap is 24. n where d is the number of diagonal constraints 
and n is the number of states of A. 


The above theorem allows to solve the reachability of a timed automaton A 
by first converting it into the diagonal free automaton Agr and then checking 
reachability on Age. However, this conversion comes with a systematic exponen- 
tial blowup (in terms of the number of diagonal constraints present in A). It was 
shown in [10] that such a blowup is unavoidable in general. We will now recall 
the general algorithm for analyzing timed automata, and then move into specific 
details which depend on whether the automaton has diagonal constraints or not. 


Zones and Simulations. Fix a timed automaton A with clock set X for the 
rest of the discussion in this section. As the space of valuations of A is infinite, 
algorithms work with sets of valuations called zones. A zone is set of clock 
valuations given by a conjunction of constraints of the form x — y < c, x < c and 
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c <x where c € Zand < € {<, <}, for example the solutions of r—y < 5Ay < 10 
is a zone. The transition relation over configurations (q, v) is extended to (q, Z) 
where Z is a zone. We define the following operations on zones given a guard g 
and a set of clocks R: time elapse Z= {v +ô | v € Z, ô > 0}; guard intersection 
Z^g := {v |v € Z and v H g} and reset [R]Z := {[R]v | v € Z}. It can be shown 
that all these operations result in zones. Zones can be efficiently represented and 
manipulated using Difference Bound Matrices (DBMs) [15]. 

The zone graph ZG(A) of timed automaton A is a transition system whose 
nodes are of the form (q, Z) where q is a state of A and Z is a zone. For 
each transition t := (q,g, R,q') of A, and each zone (q, Z) there is a transi- 

e’ 
tion (q, Z) = (q', Z') where Z’ = [R](Z ^ g). The initial node is (qo, Zo) where 
qo is the initial state of A and Zo = {0 + ô | ô > 0} is the zone obtained by 
elapsing an arbitrary delay from the initial valuation. A path in the zone graph 
is a sequence (qo, Zo) =" (q1, Z1) >" <- 3-1 (qn, Zn) starting from the 
initial node. The path is said to be accepting if qn is an accepting state. The 
zone graph is known to be sound and complete for reachability. 


Theorem 2. /14] A has an accepting run iff ZG(A) has an accepting path. 


This does not yet give an algorithm as the zone graph ZG(A) is still not 
finite. Moreover, there are examples of automata for which the reachable part 
of ZG(A) is also infinite: starting from the initial node, applying the successor 
computation leads to infinitely many zones. Two different approaches have been 
studied to get finiteness, both of them based on the usage of simulation relations. 

A (time-abstract) simulation relation (=<) between configurations of A is a 
reflexive and transitive relation such that (q,v) = (q', v’) implies q = q' and (1) 
for every ô > 0, there exists 6’ > 0 such that (q¢,v + ô) x (g,v’ + 0’) and (2) 
for every transition t of A, if (q,v) = (qi, v1) then (q,v’) > (q1, v{) such that 
(1,01) 3 (m1, %1). 

We say v x v’, read as v is simulated by v’ if (q,v) x (q,v’) for all states 
q. The simulation relation can be extended to zones: Z < Z’ if for every v € Z 
there exists v’ € Z’ such that v < v’. We write |Z for {v | du’ E€ Z s.t. v g v'}. 
The simulation relation < is said to be finite if the function mapping zones Z to 
the down sets |Z has finite range. We now recall a specific simulation relation 
<rv [5,23]. Current algorithms and tools for diagonal-free automata are based 
on this simulation. The conditions required for v <,, v’ ensure that when all 
lower bound constraints c < x satisfy c < L(x) and all upper bound constraints 
x <c satisfy c < U(x), whenever v satisfies a constraint, v’ will also satisfy it. 


Definition 1 (LU-bounds and the relation <,, [5,23]).. An LU-bounds 
function is a pair of functions L : X ++ NU{-—oo} and U : X + NU {-oo} that 
map each clock to either a non-negative constant or —oo. Given an LU-bounds 
function, we define v X,y v' for valuations v,v' if for every clock x € X: 


v(x) < v(x) implies L(x) < v'(x) and v(x) < v'(x) implies U(x) < v(x). 
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Reachability in Diagonal-Free Timed Automata. A natural method to 
get finiteness of the zone graph is to prune the zone graph computation through 
simulations Z < Z’: do not explore a node (q, Z) if there is an already visited 
node (q, Z’) such that Z < Z’. Since these simulation tests need to be done often 
during the zone graph computation, an efficient algorithm for performing this 
test is crucial. Note that Z x Z’ iff Z C |Z’. However, it is known that the set 
{Z’ is not necessarily a zone (this was proved for |, Z’ in [5]), and hence no 
simple zone inclusions are applicable. The first algorithms for timed automata 
followed a different approach, which we call the extrapolation approach. In this 
approach, whenever a new zone Z is discovered by the algorithm, a new zone 
Extra(Z)(D Z) gets computed and stored in the place of Z. 


Reachability Algorithm Using Zone Extrapolation. The input to the algorithm is 
a timed automaton A. The algorithm maintains two lists, Passed and Waiting. 
Initially, the node (qo, Extra(Zo)) is added to the Waiting list (recall that (qo, Zo) 
is the initial node of the zone graph ZG(.A)). Wlog. we assume that qo is not 
accepting. The algorithm repeatedly performs the following steps: 


Step 1. If Waiting is empty, then return “A has no accepting run”; else pick 
(and remove) a node (q, Z) from Waiting. Add (q, Z) to Passed. 

Step 2. For each transition t := (q,g,R,q), compute the successor (q, Z) =t 
(qı, Z1): if Z, Æ @ perform the following operations - if qı is accepting, return 
“A has an accepting run”; else compute Z, := Extra(Z,) and check if there 
exists a node (q1, Z!) in Passed or Waiting such that Z, C Z/: if yes, ignore 


the node (q1, 21), otherwise add (qi, Z:) to Waiting. 


Several extrapolation operators (Extra,,, Extrazv, Extrat, ) were introduced 
in [5]. The function Extrat, has nice properties - (1) Extra, (Z) C lzvZ and (2) 
Extra}, ( Z) is a zone for all Z. These properties give an algorithm that performs 
only efficient zone operations: successor computations and zone inclusions. 


Reachability Algorithm Using Simulations. The initial node (qo, Zo) is added 
to the Waiting list. Wlog. we assume that qo is not accepting. The algorithm 
repeatedly performs the following steps: 


Step 1. If Waiting is empty, then return “A has no accepting run”; else pick 
(and remove) a node (q, Z) from Waiting. Add (q, Z) to Passed. 

Step 2. For each transition t := (q, g, R,q1), compute the successor (q, Z) =t 
(qı, Z1): if Z, Æ Ú perform the following operations - if qı is accepting, return 
“A has an accepting run”; else check if there exists a node (q1, Z1) in Passed 
or Waiting such that Zı x Z4: if yes, ignore the node (q1, Z1), otherwise add 
(q1, Z1) to Waiting. 


An O(|X|?) algorithm for Z <,, Z’ was proposed in [23]. The efficiency of 
this simulation check makes it well suited for use in practice. Moreover, as 
Extrat, (Z) C |ruZ, we expect to get more simulations (and hence quicker ter- 
mination) through gzv. 
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Reachability in the Presence of Diagonal Constraints. The %zy relation 
is no longer a simulation when diagonal constraints are present. Moreover, it was 
shown in [9] that no extrapolation operator (along the lines of Extrat) can work 
in the presence of diagonal constraints. The first option to deal with diagonals is 
to use Theorem 1 to get a diagonal free automaton and then apply the methods 
discussed previously. One problem with this is the systematic exponential blowup 
introduced in the number of states of the resulting automaton. Another problem 
is to get diagnostic information: counterexamples need to be translated back to 
the original automaton [6]. Various methods have been studied to circumvent 
the diagonal free conversion and instead work on the automaton with diagonal 
constraints directly. We recall the approach used in the state-of-the-art tool 
UPPAAL below. 


Zone Splitting [6]. The paper introducing timed automata gave a notion of equiv- 
alence between valuations v ~m v’ parameterized by a function M mapping each 
clock x to the maximum constant M among the guards of the automaton that 
involve x. This equivalence is a finite simulation for diagonal-free automata. 
Equivalence classes of ~ yy are called regions. This was extended to the diagonal 
case by [6] as: v ~4, v’ if v ~ v' and for all diagonal constraints g present in 
the automaton, if v = g then v’ = g. The ~%, relation splits the regions further, 
such that each region is either entirely included inside g, or entirely outside g for 
each g. The next step is to use this notion of equivalence in zones. The paper [6] 
follows the extrapolation approach: to each zone Z, an extrapolation operation 
Extra,;(Z) is applied; this adds some valuations which are ~w equivalent to 
valuations in Z; then it is further split into multiple zones, so that each small 
zone is either inside g or outside g for each diagonal constraint g. If d is the 
number of diagonal constraints present in the automaton, this splitting process 
can give rise to 2% zones for each zone Z. From each small zone, the zone graph 
computation is started. Essentially, the exponential blow-up at the state level 
which appeared in the diagonal-free conversion now appears in the zone level. 
In this paper, we propose a new simulation to handle diagonal constraints. 
This has two advantages - using this avoids the blow-up in the number of nodes 
arising due to zone splitting, and the simulation test between zones has an effi- 
cient implementation and is significantly quicker than the simulation of [18]. 


3 A New Simulation Relation 


We start with a definition of a relation between timed automata configurations, 
which in some sense “declares” upfront what we need out of a simulation relation 
that can be used in a reachability algorithm. As we proceed, we will make its 
description more concrete and give an effective simulation algorithm between 
zones, that can be implemented. Fix a clock set X. This generates constraints 
P(X). 


Definition 2 (the relation C,). Let G be a (finite or infinite) set of con- 
straints. We say v Cg v' if for ally E€ G and all 6 > 0, v+6 H ọ implies 
v +H g. 
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Our goal is to utilize the above relation in a simulation (as defined in p. xx) 
for a timed automaton. Directly from the definition, we get the following lemma 
which shows that the Co relation is preserved under time elapse. 


Lemma 1. [fv E, v’, thenv+6C, v’ +6 for all 6 > 0. 


The other kind of transformation over valuations is resets. Given sets of 
guards G,, G and a set of clocks R, we want to find conditions on G, and G so 
that if v Cg, v’ then [R]v C, [R]v’. To do this, we need to answer this question: 
what guarantees should we ensure for v,v’ (via G1) so that [R]v Es [R]v’. This 
motivates the next definition. 


Definition 3 (weakest pre-condition of C, over resets). For a constraint 
yp and a set of clocks R, we define a set of constraints wp(C,, R) as follows: 
when vp is of the form x <c or c< a, then wp(C,, R) is empty if x € R and is 


{p} otherwise; when ọ is a diagonal constraint x — y < c, then wp(C,, R) is: 
- {x-y <c} if {ty} NR=0 

- {x <c} ifyeR,x«¢Randc>0 

- {—c <y} ifx Ee R, y g R and —c > 0 


— empty, otherwise. 


For a set of guards G, we define wp(Ec, R) := Uves wp(C,, R). 


Note that the relation CE, is parameterized by a set of constraints. Addi- 
tionally, we desire this set to be finite, so that the relation can be used in an 
algorithm. We need to first link an automaton A with such a set of constraints. 
One way to do it is to take the set of all guards present in the automaton and 
to close it under weakest pre-conditions with respect to all possible subsets of 
clocks. A better approach is to consider a set of constraints for each state, as in 
[4] where the parameters for extrapolation (the maximum constants appearing 
in guards) are calculated at each state. 


Definition 4 (State based guards). Let A = (Q,X,q0,T,F) be a timed 
automaton. We associate a set of guards G(q) for each state q E€ Q, which is the 
least set of guards (for the coordinate-wise subset inclusion order) such that for 
every transition (q,g,R,q): the guard g and the set wp(Egy,),R) are present 
in G(q). More precisely, {G(q)}qeq is the least solution to the following set of 
equations written for each q € Q: 


G(q) = U {9} U wp(Geca,), R) 


(q.9,R,q)ET 


All constraints present in the set wp(Cg¢,,,),.) contain constants which are 
already present in E,,,,). The least solution to the above set of equations can 
therefore be obtained by a fixed point computation which starts with G(q) set to 
eae: Ra) erig} and then repeatedly updates the weakest-preconditions. Since 
no new constants are generated in this process, the fixed point computation 
terminates. We now have the ingredients to define a simulation relation over 
configurations of a timed automaton with diagonal constraints. 
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Definition 5 (A-simulation). Let A = (Q,X,q,T,F) be a timed automaton 

and let the set of guards G(q) of Definition 4 be associated to every state q € 

Q. We define a relation <4 between configurations of A as (q,v) 3a (q,v’) if 
~ / 

VU So) U- 


Lemma 2. The relation <, is a simulation on the configurations of timed 
automaton A. 


As pointed before, Definition 2 gives a declarative description of the simula- 
tion and it is unclear how to work with it algorithmically, even when the set of 
constraints G is finite. The main issue is with the Yô quantification, which is not 
finite. We will first provide a characterization that brings out the fact that this 
Yå quantification is irrelevant for diagonal constraints (essentially because value 
of v(x) — u(y) does not change with time elapse). Given a set of constraints G, 
let G7 CG be the set of non-diagonal constraints in G. 


Proposition 1. v C, v’ iffv C,- v’ and for all diagonal constraints p € G, if 
vH ọ then v Fg. 


It now amounts to solving the Vô problem for non-diagonals. It turns out 
that the <,, simulation achieves this, almost. We will see this in more detail in 
the next section. 


4 Algorithm for Z C, Z’ 


Fix a finite set of guards G. Restating the definition of EL, extended to zones: 
ZC, Z' if for all v € Z there exists a v’ € Z’ such that v CE, v’. In this 
section, we will view the characterization of E, as in Proposition 1 and give an 
algorithm to check ZC, Z’ that uses as an oracle a test Z L,- Z’. We discuss 
the computation of Z C,- Z’ later in this section. We start with an observation 
following from Proposition 1. 


Lemma 3. Let p := x — y <c be a diagonal constraint in G. Then Z Es Z' if 
and only if ZN Eg Z'N Y and ZN ~y Eg Z' where G' = G \ {yp}. 
If G has no diagonal constraints, ZC, Z' if and only if Z E- Z'. 


This leads to the following algorithm consisting of two mutually recursive 
procedures. This algorithm is essentially an implementation of the above lemma, 
with two optimizations: 


— we start with the non-diagonal check in Line 6 of Algorithm 1 - if this is 
already violated, then the algorithm returns false; 

— suppose Z E,- Z’, the next task is to perform the checks in the first statement 
of Lemma 3 - this is done by Algorithm 2; note however that when Algorithm 
2 is called, we already have Z E,- Z’, hence Z N ~y C,- Z’. Therefore we 
use an optimization in Line 7 by calling Algorithm 2 directly (as the check in 
Line 6 of Algorithm 1 will be redundant). 


50 P. Gastin et al. 


1 check Z C% Z’: 
1 check Z Cg Z': 2 if G does not contain any 
2 if Z=0: diagonal constraints : 
return true 3 | return true 
if Z'=0: 4 pick a diagonal constraint 
5 return false p =z- y <c from G 
6e | 422027: 5 | g — G \ {yp} 
|_ return false 6 if ZN =v FO: 
8 return Z C% Z' 7 if ZN -y Z% Z': 
z 8 | return false 
9 return ZN y Lg Z'N 


Algorithm 1 Algorithm 2 


Computing Z C,- Z’. We will use <,, to approximate E,-: in our imple- 
mentation of the above algorithms, we replace Z E- Z’ with Z <,, Z’. This 
works because for an appropriate choice of LU (explained below), we have 
Z Xiu) Z! > Z E,- Z’. The converse is not true as the LU bounds func- 
tions cannot distinguish between guards with < and < comparisons. Therefore, 
the zy simulation does not characterize v E¿- v’ completely. Although we are 
aware of the (rather technical) modifications to <,, simulation that are needed 
for this characterization, we choose to use the existing <,,, directly as it is safe 
to do so and it has already been implemented in tools. This gives us a finer 


simulation than v C,- v. 


Definition 6 (LU-bounds from G). Let G be a finite set of constraints. We 
define LU(G) to denote the pair of functions Lg and Ug defined as follows: 


ra —o0 if there is no guard of the form c < x in G 
x)= 
Á max{c |cax EG} otherwise 
ct) —0o if there is no guard of the form x <c in G 
x)= 
max{c|x<ceEG} otherwise 


Lemma 4. For every set of constraints G, v Xziuig) V implies v Eg- v’. 


The above observations call for the next definition and subsequent lemmas. 


Definition 7 (approximating Eç). Let G be a finite set of constraints. We 
define a relation EE" as follows: v LEY v' if v rug) v’ and for all diagonal 
constraints p E€ G, if v =| ọ then v' = ọ. Similarly, define <4" as (q,v) 347 


YA 
(Gru yar ee 


Lemma 5. The relation <4" is a finite simulation on the configurations of A. 
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The above lemma and the fact that Z <zy,g) Z’ can be checked in O(|X|?) 
[23,33], imply the following theorem. 


Theorem 3. When using Z <zu;g) Z in the place of Z E,- Z', the algorithm 
is correct and it terminates in O(27- |X|?) where d is the number of diagonal 
guards in G. 


From a complexity viewpoint, this algorithm is not efficient since it makes 
an exponential number of calls in the number of diagonal constraints (in fact 
this may not be avoidable due to Lemma 6, which follows from the NP-hardness 
result in [18]). Although the above algorithm does involve many calls, the internal 
operations involved in each call are simple zone manipulations. Moreover, the 
preliminary checks (for instance line 6 of Algorithm 1) cut short the number 
of calls. This is visible in our experiments which are very good, especially with 
respect to running time, as compared to other methods. A similar hardness was 
shown for a different simulation in [18], but the implementation there indeed 
witnessed the hardness, as the time taken by that algorithm was unsatisfactory. 


Lemma 6. Deciding Z YEY Z' is NP-complete. 


5 Simulations for Updatable Timed Automata 


In the timed automata considered so far, clocks are allowed to be reset to 0 along 
transitions. We consider in this section more sophisticated transformations to 
clocks in transitions. These are called updates. An update up : RX > RIŽ! isa 
function mapping non-negative |X|-dimensional reals (valuations) v to general 
|X|-dimensional reals (which may apriori not be valuations as the coordinates 
may be negative). The syntax of the update function up is given by a set of 
atomic updates up, to each z € X, which are of the form x := c or x := y+ d 
where c € N, d € Z and y € X (possibly equal to x). Note that we want d to be 
an integer, since we allow for decrementing clocks, and on the other hand c € N 
since we have non-negative clocks. Given a valuation v and an update up, the 
valuation up(v) is: 


c if up, is £ := c 
v(y)+d ifuprisz:=y+d 


Note that in general, due to the presence of updates x := y+ d, the update up(v) 
may not yield a clock valuation. However, when it does give a valuation, it can 
be used as a transformation in timed automata transitions. We say up(v) > 0 if 
up(v)(x) > 0 for all clocks x € X. 

An updateable timed automaton (UTA) A = (Q, X, qo, T, F) is an extension 
of a classic timed automaton with transitions of the form (q, g, up, q’) where up 
is an update. Semantics extend in the natural way: delay transitions remain the 


same, and for action transitions t := (q, g, up, q’) we have (q, v) 4 (q, v’) ifv Eg, 
up(v) > 0, and v’ = up(v). We allow the transition only if the update results 
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in a valuation. The reachability problem for these automata is known to be 
undecidable in general [12]. Various subclasses with decidable reachability have 
been discussed in the same paper. Decidability proofs in [12] take the following 
flavour, for a given automaton A: (1) divide the space of all valuations into a 
finite number of equivalence classes called regions (2) to build the parameters for 
the equivalence, derive a set of diophantine equations from the guards of A; if 
they have a solution then construct the quotient graph of the equivalence (called 
region graph) parameterized by the obtained solution and check reachability on 
it; if the equations have no solution, output that reachability for A cannot be 
answered. Sufficient conditions on the nature of the updates that give a solution 
to the diophantine equations have been tabulated in [12]. When the automaton 
is diagonal-free, the “region-equivalence” can be used to build an extrapolation 
operation which in turn can be used in a reachability algorithm with zones. 
When the automaton contains diagonals, the region-equivalence is used to only 
build a region graph - no effective zone based approach has been studied. 

We use a similar idea, but we have two fundamental differences: (1) we want 
to obtain reachability through the use of simulations on zones, and (2) we build 
equations over sets of guards as in Definition 4. The advantage of this approach 
is that this allows the use of coarser simulations over zones. Even for automata 
with diagonal constraints and updates, we get a zone based algorithm, instead 
of resorting to regions which are not efficient in practice. 

The notion of simulations as in p. xx remains the same, now using the seman- 
tics of transitions with updates. We will re-use the simulation relation Eg. We 
need to extend Definition 3 to incorporate updates. We do this below. Here is a 
notation: for an update function up, we write up(x) to be c if up, is x := c, and 
up(x) to be y+ c if up, is z := y + c. 


Definition 8 (weakest pre-condition of E, over updates). 

Let up be an update. 

For a constraint p of the form x < c or c < x, we define wp(C,, up) to be 
respectively {up(x) < c} or {c < up(x)} if these resulting constraints are of the 
form z<dord<z withz € X and d > 0, otherwise wp(C,, up) is empty. 

For a constraint p : x—y <c, we define wp(C,, up) to be {up(x) — up(y) <c} 
if this constraint is either a diagonal using different clocks, or it is of the form 
z<dord<z with d> 0, otherwise wp(E,, up) is empty. 

For a set of guards G, we define wp(Cg, up) := Uveg wp(C,, up). 


Some examples: wp(x < 5,2 := x +10) is empty, since up(x) is x + 10, and 
the guard z+ 10 < 5 is not satisfiable; wp(x < 5,2 := x — 10) is x < 15, wp(a < 
5, x := c) is empty, wp(x—y < 5, (x := 21, y := z2+10)) will be z1 —(z2+10) < 5, 
giving the constraint z1 — z2 < 15, wp(@—y < 5, (x := z + c1, Y := Z + c2)) is 
empty, wp(x — y < 5, (£ := c1, Y := Z + c2) ) is € = c1 — 5 — C2 < z if c > 0 and is 
empty otherwise. 


Definition 9 (State based guards). Let A = (Q,X,qo,T,F) be a UTA. 
We associate a set of constraints G(q) for each state q E€ Q, which is the least 
set of constraints (for the coordinate-wise subset inclusion order) such that for 
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every transition (q, g,up,qı): the guard g and the set wp(Eci), up) are present 
in G(q), and in addition constraints that allow the update to happen are also 
present in G. The last condition is given by the weakest precondition of the set 
of constraints {x > 0 | x € X}. Overall, {G(q)}qeq is the least solution to the 
following set of equations, for each q € Q: 


Ga= VU ({9} U wrlEgesojeex}, up) U wp(Ecun up) ) 
(4,.9,up,q1)ET 


The least solution {G(q)}qeq is said to be finite if each G(q) is a finite set of 
constraints. 


In contrast to the simple reset case, the above set of equations may not have 
a finite solution. Consider a self-looping transition: (q, £ < c, x := x —1,q). We 
require x < c E€ G(q). Now, wp(a < c,x := x — 1) is x dc +1 which should be 
in G(q) according to the above equation. Continuing this process, we need to 
add x < d for every natural number d > c. Indeed this is consistent with the 
undecidability of reachability when subtraction updates are allowed. We deal 
with the subject of finite solutions to the above equations later in this section. 
On the other hand, when the above system does have a solution with finite G(q) 
at every q, we can use the A simulation of Definition 5 and its approximation 
=<%" to get an algorithm. 


Proposition 2. Let A = (Q,X,q0,T,F) be a UTA. Let {G(q)}qeq be the least 
solution to the equations given in Definition 9. Then, the relation x4 is a sim- 
ulation on the configurations of A. 


Lemma 7. For a UTA A, assume that the least solution {G(q)}qeq to the state- 
based guards equations is finite. Then the relation x% is a finite simulation on 
the configurations of A. 


Finite Solution to the State-Based Guards Equations. The least solution 
to the equations of Definition 9 can be obtained by a standard Kleene iteration 
for fixed points computation. For each ¿i > 0 and each state q, define: 


= VU  {g}Uwp(Epesojrex}, up) 
(q,9,up,q’ JET 
GT (q) = U G (q) U wp(Esiu’ UP) 
(4,9,up,q')ET 


The iteration stabilizes when there exists a k satisfying G**+1(q) = G*(q) for all 
q. At stabilization, the values G*(q) satisfy the equations of Definition 9, and 
give the required G(q). However, as we mentioned earlier, this iteration might 
not stabilize at any k. We will now develop some observations that will help 
detect after finitely many steps if the iteration will stabilize or not. 

Suppose we colour the set G+! (q) to red if either there exists a diagonal 
constraint x — y <c € G+! (q) \G"(q) (a new diagonal is added) or there exists a 
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non-diagonal constraint x <c or c <x in G*t+(q) \ G*(q) such that the constant 
c is strictly bigger than c’ for respectively every non-diagonal x < c or œ < x 
in G’(q) (a non-diagonal with a bigger constant is added). If this condition is 
not applicable, we colour the set G’+'(q) green. The next observations say that 
the iteration terminates iff we reach a stage where all sets are green. Intuitively, 
once we reach green, the only constraints that can be added are non-diagonals 
having smaller (non-negative) constants and hence the procedure terminates. 


Lemma 8. Let i> 0. If G’(q) is green for all q, then G+! (q) is green for all q. 


Lemma 9. Let K = 1+ |Q|-|X|-(|X|+1). If there is a state p such that G* (p) 
is red, then there is no i such that G*(q) is green for all q. 


As to why the bound K = 1 + |Q|- |X|- (X| +1) in the lemma above: a red 
state at stage 7 arises due to the addition of a constraint y; at state p;, which in 
turn depends on a state p;_; marked red at stage i— 1 due to constraint y;_ . If 
we iterate sufficiently long, we will hit a state p, a sequence of transitions from 
p to p and a constraint y such that computing the weakest precondition over 
this loop will give a new constraint with the same set of clocks as y but with a 
different constant. This part can be iterated infinitely often. 


Proposition 3. The least solution of the local constraint equations for a UTA 
is finite iff GË (q) is green for all q and where K = 1+ |Q|-|X|-(|X| +1). 


Theorem 4. Let A be a UTA. It is decidable whether the equations in Defini- 
tion 9 have a finite solution. When these equations do have a finite solution, zone 
graph enumeration using <4" is a sound, complete and terminating procedure 
for the reachability problem. 


All decidable classes of [12] can be shown decidable with our approach, by 
showing stabilization of the G(q) computation. 


Lemma 10. Reachability is decidable in UTA where: guards are non-diagonals 
and updates are of the form x := c, x := y, x := y + c where c > 0 or, guards 
include diagonal constraints and updates are of the form z := c, x := y. 


6 Experiments 


We have implemented the reachability algorithm for timed automata with diag- 
onal constraints (and only resets as updates) based on the simulation approach 
(p. xx) using the 47 simulation (Definition 7) for pruning zones. The algorithm 
for Z EZ" Z' comes from Sect. 4. Experiments are reported in Table 1. We take 
model Cex from [8,30] and Fischer from [30]. We are not aware of any other 
“standard” benchmarks containing diagonal constraints. In addition to these two 
models, we introduce a new benchmark. This is an extension of the job-shop 
scheduling using (diagonal-free) timed automata [1]. Here the tasks within a 
job were logically independent. We add some timing dependency between them 
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Table 1. Experiments: the column #D gives the number of diagonal constraints. Four 
methods have been reported in the table. First two methods, TChecker with our sim- 
ulation relation CG” and UPPAAL engine for diagonals, have been run on A, the 
automata containing diagonal constraints. Whereas, the third and fourth methods are 
running diagonal-free engines of UPPAAL and TChecker on Ag, a diagonal-free equiv- 


alent of A. Experiments were run on macOS X with 2.3 GHz Intel core i5 processor, 


and 8GB RAM. Time is reported in seconds. We set a timeout of 15 min. 
Model #/D | A: contains diagonals Aaf: diagonal-free equivalent of A 
TChecker + EEY UPPAAL UPPAAL TChecker 
Time Nodes count | Time Nodes count | Time Nodes count | Time Nodes count 
Cex 2 4 0.047 241 0.026 |2180 0.005 1039 0.067 1039 
Cex 3 6 7.399 TALT 111.168 |182394 1.028) 60982 40.092 |60982 
Cex 4 8 |857.662|185209 Timeout |- 734.543 3447119 Timeout |- 
Fischer 4 4 0.032 452 307.836 | 357687 0.009 1815 0.100 1815 
Fischer 5 5 0.257} 1842 Timeout |- 0.116 12511 1.856 12511 
Fischer 7 T 15.032| 26812 Timeout |- 174.560) 693603 Timeout |- 
Job Shop 3|12 0.420 278 23.093 |31711 0.003 845 0.312 |845 
Job Shop 5/20 |285.421| 10592 Timeout |- 4.633) 179607 150.811 | 179607 


which gets naturally modeled using diagonal constraints. Each model considered 
above is a product of a number of k timed automata. In the table we write the 
name of the model and the number k of automata involved in the product. We 
also report the number of diagonal constraints in each of them. 


Experimental Results. We report the results of four methods of handling diago- 
nal constraints, as mentioned in the caption of Table 1. Under each method, we 
report on the number of zones enumerated and the time taken. The first method 
gives a huge gain over the second one (upto four orders of magnitude in the 
number of nodes, and even better for time) and gives a less marked, but still sig- 
nificant, gain over the third and fourth methods. We provide a brief explanation 
of this phenomenon. The performance of the reachability algorithm is dependent 
on three factors: 


— parameters of extrapolation or simulation: /-simulations which use the max- 
imum constant appearing in the guards, versus the LU-simulations which 
make a distinction between lower bound guards c < x and upper bound 
guards x < c (refer to [5] for the exact definitions of extrapolations based 
on these parameters, and [23] for simulations based on these parameters); 
LU-simulations are superior to M-simulations. 

— computation of the parameters: global parameters which associate a bound 
to each clock versus the more local state based parameters as in Definition 4 
which associate a set of bounds functions to each state [4]; local bounds are 
superior to global bounds. 

— when diagonal constraints are present, whether zones get split or not: each 
time a zone gets split, new enumerations start from each of the new nodes; 
clearly, a no-splitting-of-zones approach is superior to zone splitting. 


56 P. Gastin et al. 


Algorithm of column 1 uses the superior heuristic in all the three optimiza- 
tions above. The no-splitting-of-zones was possible thanks to our simulation app- 
roach, which temporarily splits zones for checking Z EZ" Z’, but never starts a 
new exploration from any of the split nodes. The algorithm of column 2, which is 
implemented in the current version UPPAAL 4.1 uses the inferior heuristic in all 
the three above. In particular, it is not clear how the extrapolation approach can 
avoid the zone splitting in an efficient manner. The superiority of our approach 
gets amplified (by multiplicative factors) when we consider bigger products with 
many more diagonals. In the third and fourth methods, we give a diagonal free 
equivalent of the original model (c.f. Theorem 1) and use the UPPAAL and 
TChecker engines respectively, for diagonal free timed automata. The UPPAAL 
diagonal free engine is highly optimized, and makes use of the superior heuristics 
in the first two optimizations mentioned above (the third heuristic is not appli- 
cable now as it is a diagonal free automaton). The third and fourth methods 
can be considered as a good approximation of the zone splitting approach to 
diagonal constraints using LU-abstractions and local guards. 

The second and the third methods are the only possibilities of verifying timed 
models coming with diagonal constraints in UPPAAL. Both these approaches 
are in principle prone to a 2*? blowup compared to the first approach, where 
#D gives the number of diagonal constraints. The table shows that a good 
extent of this blowup indeed happens. The UPPAAL diagonal free engine uses 
“minimal constraint systems” [6] for representing zones, whereas TChecker uses 
DBMs [15]. This explains why even with the same number of nodes visited, 
UPPAAL performs better in terms of time. We have not included in the table 
the comparison with two other works dealing with the same problem: the refined 
diagonal free conversion [30] and the extension of LU simulation for diagonals 
[18]. However, our results are better than the tables reported in these papers. 


7 Conclusion 


We have proposed a new algorithm for handling diagonal constraints in timed 
automata, and extended it to automata with general updates. Our approach 
is based on a simulation relation between zones. From our preliminary exper- 
iments, we can infer that the use of simulations is indispensable in the pres- 
ence of diagonal constraints as zone-splitting can be avoided. Moreover, the fact 
that the simulation approach stores the actual zones (as opposed to abstracted 
zones in the extrapolation approach) has enabled optimizations for diagonal-free 
automata that work with dynamically changing simulation parameters (LU- 
bounds), which are learnt as and when the zones are expanded [22]. Working 
with actual zones is also convenient for finding cost-optimal paths in priced timed 
automata [11]. Investigating these in the presence of diagonal constraints is part 
of future work. Currently, we have not implemented our approach for updateable 
timed automata. This will also be part of our future work. 
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Working directly with a model containing diagonal constraints could be con- 
venient (both during modeling, and during extraction of diagnostic traces) and 
can also potentially give a smaller automaton to begin with. We believe that our 
experiments provide hope that diagonal constraints can indeed be used. 
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Abstract. Discounted-sum inclusion (DS-inclusion, in short) formalizes 
the goal of comparing quantitative dimensions of systems such as cost, 
resource consumption, and the like, when the mode of aggregation for the 
quantitative dimension is discounted-sum aggregation. Discounted-sum 
comparator automata, or DS-comparators in short, are Biichi automata 
that read two infinite sequences of weights synchronously and relate their 
discounted-sum. Recent empirical investigations have shown that while 
DS-comparators enable competitive algorithms for DS-inclusion, they 
still suffer from the scalability bottleneck of Biichi operations. 

Motivated by the connections between discounted-sum and Biichi 
automata, this paper undertakes an investigation of language-theoretic 
properties of DS-comparators in order to mitigate the challenges of Biichi 
DS-comparators to achieve improved scalability of DS-inclusion. Our 
investigation uncovers that DS-comparators possess safety and co-safety 
language-theoretic properties. As a result, they enable reductions based 
on subset construction-based methods as opposed to higher complex- 
ity Biichi complementation, yielding tighter worst-case complexity and 
improved empirical scalability for DS-inclusion. 


1 Introduction 


The analysis of quantitative dimensions of computing systems such as cost, 
resource consumption, and distance metrics [6, 10, 28] has been studied thoroughly 
to design efficient computing systems. Cost-aware program-synthesis [14,16] and 
low-cost program-repair [25] have found compelling applications in robotics [24, 
29], education [22], and the like. Quantitative verification facilitates efficient system 
design by automatically determining if a system implementation is more efficient 
than a specification model. Investigations in quantitative verification have demon- 
strated their high computational complexity and practically intractable [17,23]. 
This work addresses practical intractability of quantitative verification. 

At the core of quantitative verification lies the problem of quantitative inclu- 
sion which formalizes the goal of determining which of two given systems is more 
efficient [17,23,31]. In quantitative inclusion, quantitative systems are abstracted 
as weighted automata [7,21,32]. A run in a weighted automaton is associated 
with a sequence of weights. The quantitative dimension of these runs is deter- 
mined by the weight of runs, which is computed by taking an aggregate of the 
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run’s weight sequence. Quantitative inclusion can be thought of as the quanti- 
tative generalization of (qualitative) language inclusion. 

A commonly appearing mode of aggregation is that of Discounted-sum (DS) 
aggregation which captures the intuition that weights incurred in the near future 
are more significant than those incurred later on [19]. The convergence of DS 
aggregation for all bounded infinite weight-sequences makes it a preferred mode 
of aggregation across domains: Reinforcement learning [37], planning under 
uncertainty [34], and game-theory [33]. This work examines the problem of 
Discounted-sum inclusion or DS-inclusion that is quantitative inclusion when 
discounted sum is the mode of aggregation. 

In theory, DS-inclusion is PSPACE-complete [12]. Recent algorithmic 
approaches have tapped into language-theoretic properties of discounted-sum 
aggregate function [12,18] to design practical algorithms for DS-inclusion [11,12]. 
These algorithms use DS-comparator automata (DS-comparator, in short) as 
their main technique, and are purely automata-theoretic. While these algorithms 
outperform other existing approaches for DS-inclusion in runtime [15,17], even 
these do not scale well on weighted-automata with more than few hundreds 
of states [11]. This work contributes novel techniques and algorithms for DS- 
inclusion to address the scalability challenge of DS-inclusion 

An in-depth examination of the DS-comparator based algorithm exposes 
their scalability bottleneck. DS-comparator is a Büchi automaton that relates 
the discounted-sum aggregate of two (bounded) weight-sequences A and B by 
determining the membership of the interleaved pair of sequences (A, B) in the 
language of the comparator. As a result, DS-comparators reduce DS-inclusion to 
language inclusion between (non-deterministic) Biichi automaton. In spite of the 
fact that many techniques have been proposed to solve Büchi language inclusion 
efficiently in practice [4,20], none of them can avoid at least an exponential blow- 
up of 20 (7 logn) for an n-sized input, caused by a direct or indirect involvement 
of Biichi complementation [36,40]. 

This work meets the scalability challenge of DS-inclusion by delving deeper 
into language-theoretic properties of discounted-sum aggregate functions [18] in 
order to obtain algorithms for DS-inclusion that render both tighter theoretical 
complexity and improved scalability. Specifically, we prove that DS-comparators 
are expressed as safety automata or co-safety automata [26] (Sect. 3.1), and have 
compact deterministic constructions (Sect. 3.2). Safety and co-safety automata 
have the property that their complementation is performed by simpler and lower 
2°(")_complexity subset-construction methods [27]. As a result, they facilitate 
a procedure for DS-inclusion that uses subset-construction based intermediate 
steps instead of Büchi complementation, yielding an improvement in theoretical 
complexity from 20("!°8”") to 20), Our subset-construction based procedure 
has yet another advantage over Biichi complementation as they support efficient 
on-the-fly implementations, yielding practical scalability as well (Sect. 4). 

An empirical evaluation of our prototype tool QuIPFly for the proposed pro- 
cedure against the prior DS-comparator algorithm and other existing approaches 
for DS-inclusion shows that QuIPFly outperforms them by orders of magnitude 
both in runtime and the number of benchmarks solved (Sect. 4). 
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2 Preliminaries and Related Work 


A weight-sequence, finite or infinite, is bounded if the absolute value of all of its 
elements are bounded by a fixed number. 


Biichi Automaton: A Biichi automaton is a tuple A = (S, X, 6, sz, F), where 
S is a finite set of states, X is a finite input alphabet, 6 C (S x X x S) is the 
transition relation, state sz E€ S is the initial state, and F C S is the set of 
accepting states [39]. A Biichi automaton is deterministic if for all states s and 
inputs a, |{s’|(s,a,s’) € 6 for some s’}| < 1. Otherwise, it is nondeterministic. 
A Biichi automaton is complete if for all states s and inputs a, |{s’|(s,a,s’) € 
ô for some s’}| > 1. For a word w = wow1-:: € ©”, a run p of w is a sequence of 
states SoS... S.t. So = Sz, and Ti = (Si, Wi, 8:41) € Ô for all i. Let inf (p) denote 
the set of states that occur infinitely often in run p. A run p is an accepting run 
if inf(p) OAF # 0. A word w is an accepting word if it has an accepting run. 
The language of Biichi automaton A, denoted by £(A) is the set of all words 
accepted by A. By abuse of notation, we write w € A and p € A if w and p are 
an accepting word and an accepting run of A. Biichi automata are closed under 
set-theoretic union, intersection, and complementation [39]. 


Safety and Co-safety Properties: Let L C ©” be a language over alphabet X. 
A finite word w € X* is a bad prefix for £ if for all infinite words y € L”, 
x-y ¢ L. A language £ is a safety language if every word w ¢ £ has a bad 
prefix for L. A language £ is a co-safety language if its complement language 
is a safety language [5]. When a safety or co-safety language is an w-regular 
language, the Biichi automaton representing it is called a safety or co-safety 
automaton, respectively [26]. Wlog, safety and co-safety automaton contain a 
sink state from which every outgoing transitions loops back to the sink state 
and there is a transition on every alphabet symbol. All states except the sink 
state are accepting in a safety automaton, while only the sink state is accepting 
in a co-safety automaton. Unlike Büchi complementation, complementation of 
safety and co-safety automaton is conducted by simpler subset construction with 
a lower 2°) blow-up. The complementation of safety automaton is a co-safety 
automaton, and vice-versa. Safety automata are closed under intersection, and 
co-safety automata are closed under union. 


Comparator Automaton: For a finite-set of integers X, an aggregate function 
f : Z® — R, and equality or inequality relation R € {<,>,<,>,=,#4}, the 
comparison language for f with relation Ris a language of infinite words over the 
alphabet X x X that accepts a pair (A, B) iff f(A) R f(B) holds. A comparator 
automaton (comparator, in short) for aggregate function f and relation R is an 
automaton that accepts the comparison language for f with R [12]. A comparator 
is said to be regular if its automaton is a Buchi automaton. 


Weighted Automaton: A weighted automaton over infinite words is a tuple 
A = (M,7,f), where M = (S, X,ô,sz, S) is a complete Biichi automaton 
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with all states as accepting, y : ô — N is a weight function, and f : NY — R 
is the aggregate function [17,31]. Words and runs in weighted automata are 
defined as in Büchi automata. The weight-sequence of run p = 598,... of word 
w = wow... is given by wtp = noning... where nj = Y(Si, Wi, Si+1) for all i. 
The weight of a run p, denoted by f(p), is given by f(wt,). Here the weight of a 
word w E€ & in weighted automata is defined as wt,4(w) = sup{f(p)|p is a run 
of w in A}. 


Quantitative Inclusion: Let P and Q be weighted automata with the same aggre- 
gate function. The strict quantitative inclusion problem, denoted by P C Q, asks 
whether for all words w € X”, wtp(w) < wtg(w). The non-strict quantitative 
inclusion problem, denoted by P C Q, asks whether for all words w € X, 
wtp(w) < wtg(w). Comparison language or comparator of a quantitative inclu- 
sion problem refer to the comparison language or comparator of the associated 
aggregate function. 


Discounted-sum Inclusion: Let A = Ao, Ai,... be a weight sequence, d > 1 bea 
rational number. The discounted-sum (DS in short) of A with integer discount- 
factor d > 1 is DS(A, d) = 5798, As. DS-comparison language and DS-comparator 
with discount-factor d > 1 are the comparison language and comparator obtained 
for the discounted-sum aggregate function with discount-factor d > 1, respec- 
tively. Strict or non-strict discounted-sum inclusion is strict or non-strict quan- 
titative inclusion with the discounted-sum aggregate function, respectively. For 
brevity, we abbreviate discounted-sum inclusion to DS-inclusion. 


Related Work. The decidability of DS-inclusion is an open problem when the 
discount-factor d > 1 is arbitrary. Recent work has established that DS-inclusion 
is PSPACE-complete when the discount-factor is an integer [12]. This work inves- 
tigates algorithmic approaches to DS-inclusion with integer discount-factors. 

Two contrasting solution approaches have been identified for DS-inclusion. 
The first approach is hybrid [17]. It separates out the language-theoretic aspects 
of weighted-automata from the numerical aspects, and solves each separately 
[15,17]. More specifically, the hybrid approach solves the language-theoretic 
aspects by DS-determinization [15] and the numerical aspect is performed by 
linear programming [8,9] sequentially. To the best of our knowledge, this pro- 
cedure cannot be performed in parallel. As a result, this approach must always 
incur the exponential cost of DS-determinization. 

The second approach is purely-automata theoretic [12]. This approach uses reg- 
ular DS-comparator to reduce DS-inclusion to language inclusion between non- 
deterministic Biichi automata [11,12]. While the purely automata-theoretic app- 
roach scales better than the hybrid approach in runtime [11], its scalability suf- 
fers from fundamental algorithmic limitations of Biichi language inclusion. A key 
ingredient of Büchi language-inclusion is Biichi complementation [36]. Biichi com- 
plementation is 20("!°8”) in the worst-case, and is practically intractable [40]. 
These limitations also feature in the theoretical complexity and practical per- 
formance of DS-inclusion. The complexity of DS-inclusion between weighted 
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automata P and Q with regular DS-comparator C for integer discount-factor d > 1 
is |P| - 200 PI@NClogIPHQlICl) 

This work improves the worst-case complexity and practical performance of 
the purely automata theoretic approach for DS-inclusion by a closer investiga- 
tion of language-theoretic properties of DS-comparators. In particular, we iden- 
tify that DS-comparator for integer discount-factor form a safety or co-safety 
automata (depending on the relation R). We show that complementation advan- 
tage of safety /co-safety automata not only improves the theoretical complexity 
of DS-inclusion with integer discount-factor but also facilitate on-the-fly imple- 
mentations that significantly improve practical performance. 


3 DS-inclusion with Integer Discount-Factor 


This section covers the core technical contributions of this paper. We uncover 
novel language-theoretic properties of DS-comparison languages and utilize them 
to obtain tighter theoretical upper-bound for DS-inclusion with integer discount- 
factor. Unless mentioned otherwise, the discount-factor is an integer. 

In Sect. 3.1 we prove that DS-comparison languages are either safety or 
co-safety for all rational discount-factors. Since DS-comparison languages are w- 
regular for integer discount-factors [12], we obtain that DS-comparators for inte- 
ger discount-factors form safety or co-safety automata. Next, Sect. 3.2 makes use 
of newly obtained safety/co-safety properties of DS-comparator to present the 
first deterministic constructions for DS-comparators. These deterministic con- 
struction are compact in the sense that they match their non-deterministic coun- 
terparts in number of states [11]. Section 3.3 evaluates the complexity of quan- 
titative inclusion with regular safety /co-safety comparators, and observes that 
its complexity is lower than the complexity for quantitative inclusion with regu- 
lar comparators. Finally, since DS-comparators are regular safety/co-safety, our 
analysis shows that the complexity of DS-inclusion is improved as a consequence 
of the complexity observed for quantitative-inclusion with regular safety /co- 
safety comparators. 

We begin with formal definitions of safety/co-safety comparison languages 
and safety /co-safety comparators: 


Definition 1 (Safety and co-safety comparison languages). Let X be a 
finite set of integers, f : ZY — R be an aggregate function, and R E€ {<,< 
52, >,=,4} be a relation. A comparison language L over X x X for aggregate 
function f and relation R is said to be a safety comparison language (or a co- 
safety comparison language) if L is a safety language (or a co-safety language). 


Definition 2 (Safety and co-safety comparators). Let X be a finite set 
of integers, f : ZY — R be an aggregate function, and R € {<,<,>,>,=,4} 
be a relation. A comparator for aggregate function f and relation R is a safety 
comparator (or co-safety comparator) is the comparison language for f and R 
is a safety language (or co-safety language). 
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A safety comparator is regular if its language is w-regular (equivalently, if its 
automaton is a safety automaton). Likewise, a co-safety comparator is regular if 
its language is w-regular (equivalently, automaton is a co-safety automaton). 

By complementation duality of safety and co-safety languages, comparison 
language for an aggregate function f for non-strict inequality < is safety iff 
the comparison language for f for strict inequality < is co-safety. Since safety 
languages and safety automata are closed under intersection, safety comparison 
languages and regular safety comparator for non-strict inequality renders the 
same for equality. Similarly, since co-safety languages and co-safety automata 
are closed under union, co-safety comparison languages and regular co-safety 
comparators for non-strict inequality render the same for the inequality relation. 
Therefore, it suffices to examine the comparison language for one relation only. 

It is worth noting that for weight-sequences A and B and all relations R, 
we have that DS(A,d) R DS(B,d) iff DS(A— B,d) R 0, where (A — B); = 
A; — B; for all i > 0. Prior work [11] shows that we can define DS-comparison 
language with upper bound yp, discount-factor d > 1, and relation R to accept 
infinite and bounded weight-sequence C over {—,..., u} iff DS(C, d) R 0 holds. 
Similarly, DS-comparator with the same parameters u, d > 1, accepts the DS- 
comparison language with parameters u, d and R. We adopt these definitions for 
DS-comparison languages and DS-comparators 

Throughout this section, the concatenation of finite sequence x with finite or 
infinite sequence y is denoted by «- y in the following. 


3.1 DS-comparison Languages and Their Safety/Co-safety 
Properties 


The central result of this section is that DS-comparison languages are safety 
or co-safety languages for all (integer and non-integer) discount-factors (The- 
orem 1). In particular, since DS-comparison languages are w-regular for inte- 
ger discount-factors [12], this implies that DS-comparators for integer discount- 
factors form safety or co-safety automata (Corollary 1). 

The argument for safety/co-safety of DS-comparison languages depends on 
the property that the discounted-sum aggregate of all bounded weight-sequences 
exists for all discount-factors d > 1 [35]. 


Theorem 1. Let u > 1 be the upper bound. For rational discount-factor d > 1 


1. DS-comparison languages are safety languages for relations R € {<,>,=} 
2. DS-comparison language are co-safety languages for relations R € {<, >, £}. 


Proof (Proof sketch). Due to duality of safety/co-safety languages, it suffices to 
show that DS-comparison language with < is a safety language. 

Let DS-comparison language with upper bound p, rational discount-factor 
d > 1 and relation < be denoted by pha, Suppose that pEr is not a safety 


language. Let W be a weight-sequence in the complement of £4 such that W 
does not have a bad prefix. Then the following hold: (a). DS(W,d) > 0 (b). 
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For all i > 0, the i-length prefix W [i] of W can be extended to an infinite and 
bounded weight-sequence W [i] - Y* such that DS(W[i] - Y*,d) < 0. 

Note that DS(W,d) = DS(W(i],d) + 4- DS(W{i...],d) where Wii...] = 
WiWi4i1... and DS(W [i], d) is the discounted-sum of the finite sequence W |i] 
ie. DS(W{i],d) = SIZ “Ul. Similarly, DS(W/i] -Y',d) = DS(W[i],d) + 4 - 
DS(Y',d). The contribution of tail sequences W[i...] and Y* to the discounted- 
sum of W and Wi] - Y*, respectively, diminishes exponentially as the value of 
i increases. In addition, since W and W/i]- Y* share a common i-length prefix 
Wi], their discounted-sum values must converge to each other. The discounted 
sum of W is fixed and greater than 0, due to convergence there must be a k > 0 
such that DS(W[k] - Y*,d) > 0. Contradiction to (b). 

Therefore, DS-comparison language with < is a safety language. 


Semantically this result implies that for a bounded-weight sequence C and ratio- 
nal discount-factor d > 1, if DS(C,d) > 0 then C must have a finite prefix Core 
such that the discounted-sum of the finite prefix is so large that no infinite exten- 
sion by bounded weight-sequence Y can reduce the discounted-sum of Core - Y 
with the same discount-factor d to zero or below. 

Prior work shows that DS-comparison languages are expressed by Biichi 
automata iff the discount-factor is an integer [13]. Therefore: 


Corollary 1. Let u > 1 be the upper bound. For integer discount-factor d > 1 


1. DS-comparators are regular safety for relations R € {<,>,=} 
2. DS-comparators are regular co-safety for relations R E€ {<, >, Æ}. 


Lastly, it is worth mentioning that for the same reason [13] DS-comparators for 
non-integer rational discount-factors do not form safety or co-safety automata. 


3.2 Deterministic DS-comparator for Integer Discount-Factor 


This section issues deterministic safety/co-safety constructions for DS- 
comparators with integer discount-factors. This is different from prior works 
since they supply non-deterministic Biichi constructions only [11,12]. An out- 
come of DS-comparators being regular safety/co-safety (Corollary 1) is a 
proof that DS-comparators permit deterministic Büchi constructions, since non- 
deterministic and deterministic safety automata (and co-safety automata) have 
equal expressiveness [26]. Therefore, one way to obtain deterministic Büchi con- 
struction for DS-comparators is to determinize the non-deterministic construc- 
tions using standard procedures [26,36]. However, this will result in exponen- 
tially larger deterministic constructions. To this end, this section offers direct 
deterministic safety/co-safety automata constructions for DS-comparator that 
not only avoid an exponential blow-up but also match their non-deterministic 
counterparts in number of states (Theorem 3). 
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Key ideas. Due to duality and closure properties of safety/co-safety automata, 
we only present the construction of deterministic safety automata for DS- 
comparator with upper bound yp, integer discount-factor d > 1 and relation 
<, denoted by At, We proceed by obtaining a deterministic finite automaton, 


(DFA), denoted by bad(u, d, <), for the language of bad-prefixes of A!’ (Theo- 
rem 2). Trivial modifications to bad(, d, <) will furnish the coveted deterministic 
safety automata for AY (Theorem 3). 


Construction. We begin with some definitions. Let W be a finite weight- 
sequence. By abuse of notation, the discounted-sum of finite-sequence W with 
discount-factor d is defined as DS(W,d) = DS(W -0%,d). The recoverable-gap 
of a finite weight-sequences W with discount factor d, denoted gap(W, d), is its 
normalized discounted-sum: If W = e (the empty sequence), gap(<,d) = 0, and 
gap(W,d) = d'W|-1 . DS(W,d) otherwise [15]. Observe that the recoverable-gap 
has an inductive definition i.e. gap(e,d) = 0, where € is the empty weight- 
sequence, and gap(W - v,d) = d- gap(W, d) + v, where v € {—p,..., H}. 

This observation influences a sketch for bad(u,d, <). Suppose all possible 
values for recoverable-gap of weight sequences forms the set of states. Then, the 
transition relation of the DFA can mimic the inductive definition of recoverable 
gap i.e. there is a transition from state s to t on alphabet v € {—y,...,u} iff 
t = d-s +v, where s and v are recoverable-gap values of weight-sequences. 
There is one caveat here: There are infinitely many possibilities for the values 
of recoverable gap. We need to limit the recoverable gap values to finitely many 
values of interest. The core aspect of this construction is to identify these values. 

First, we obtain a lower bound on recoverable gap for bad-prefixes of Aint, 


Lemma 1. Let  andd > 1 be the bound and discount-factor, resp. Let T = 34> 
be the threshold value. Let W be a non-empty, bounded, finite weight-sequence. 
Weight sequence W is a bad-prefix of Ae iff gap(W, d) >T. 


Proof. Let a finite weight-sequence W be a bad-prefix of A‘. Then, 
DS(W-Y,d) > 0 for all infinite and bounded weight-sequences Y. Since 
DS(W -Y,d) = DS(W,d) + zw - DS(Y,d), we get inf(DS(W,d) + air ` 
DS(Y,d)) > 0 => DS(W,d) + +377 ` inf(DS(Y,d)) > 0 as W is a fixed 
sequence. Hence DS(W, d) + moa > 0 = gap(W,d)—T > 0. Conversely, 
for all infinite, bounded, weight-sequence Y, DS(W - Y,d)-d'“!-! = gap(W, d)+ 
1. DS(Y,d). Since gap(W, d) > T, inf(DS5(Y,d)) = —T - d, we get DS(W - Y,d) 
>0. 


Since all finite and bounded extensions of bad-prefixes are also bad-prefixes, 
Lemma 1 implies that if the recoverable-gap of a finite sequence is strinctly 
lower that threshold T, then recoverable gap of all of its extensions also exceed 
T. Since recoverable gap exceeding threshold T is the precise condition for bad- 
prefixes, all states with recoverable gap exceeding T can be merged into a single 
state. Note, this state forms an accepting sink in bad(u, d, <). 
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Next, we attempt to merge very low recoverable gap value into a single state. 
For this purpose, we define very-good prefixes for Aled, A finite and bounded 


weight-sequence W is a very good prefix for language of Ale if for all infinite, 
bounded extensions of W by Y, DS(W - Y,d) < 0. A proof similar to Lemma 1 
proves an upper bound for the recoverable gap of very-good prefixes of AS 


Lemma 2. Let  andd > 1 be the bound and discount-factor, resp. Let T = <4; 
be the threshold value. Let W be a non-empty, bounded, finite weight-sequence. 
Weight-sequence W is a very-good prefix of AS iff gap(W, d) < —T. 


Clearly, finite extensions of very-good prefixes are also very-good prefixes. Fur- 
ther, bad(u,d, <) must not accept very-good prefixes. Thus, by reasoning as 
earlier we get that all recoverable gap values that are less than or equal to —T 
can be merged into one non-accepting sink state in bad(u, d, <). 

Finally, for an integer discount-factor the recoverable gap is an integer. Let 
|x| denote the floor of x € R e.g. [2.3] = 2, |—2| = —2, |-2.3] = —3. Then, 


Corollary 2. Let u be the bound and d > 1 an integer discount-factor. Let 
T = 3h, be the threshold. Let W be a non-empty, bounded, finite weight-sequence. 


- W is a bad prefix of AL? iff gap(W, d) > |T] 
- W is a very-good prefix of AS iff gap(W, d) < |—-T} 


So, the recoverable gap value is either one of {|—T| +1,...,|T]}, or less than 
or equal to |—T], or greater than (H: This curbs the state-space to O(u)-many 
values of interest, as T= 54) < #5 andl < “fh < 2. Lastly, since gap(e, d) = 0, 
state 0 must be the initial state. 


Construction of bad(u, d, <). Let u be the upper bound, and d > 1 be the integer 
discount-factor. Let T = 74; be the threshold value. The finite-state automata 
bad(, d, <) = (S, s7, X, 8, F) is defined as follows: 


— States S = {|-T] + 1,...,|T]}U {bad, veryGood} 
— Initial state sz = 0, Accepting states F = {bad} 
— Alphabet X = {-—y,-pt+1,...,u4—1,p} 
— Transition function 6 C S x X — S where (s,a,t) € 6 then: 
1. If s € {bad, veryGood}, then t = s for alla € X 
2. Ifs e{|-T]+1,...,|T]}, andae X 
(a) If |-T| <d-s+a<|T|, thent=d-s+a 
(b) Ifd-s+a>|T], then t= bad 
(c) Ifd-s+a<|—T], then t = veryGood 


Theorem 2. Let u be the upper bound, d > 1 be the integer discount-factor. 
bad(p, d, <) accepts finite, bounded, weight-sequence iff it is a bad-prefiz of AL. 


Proof (Proof sketch). First note that the transition relation is deterministic and 
complete. Therefore, every word has a unique run in bad(u,d, <). Let last be 
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the last state in the run of finite, bounded, weight-sequence W in the DFA. Use 
induction on the length of W to prove the following: 


— last € {|-T|] +1,...,| T]} iff gap(W, d) = last 
— last = bad iff gap(W, d) > |T] 
— last = veryGood iff gap(W, d) < |-T] 


Therefore, a finite, bounded weight-sequence is accepted iff its recoverable gap 
is greater than |T]. In other words, iff it is a bad-prefix of A“. 


ae is obtained from bad(y,d,<) by applying co-Biichi acceptance condition. 


Theorem 3. Let u be the upper bound, and d > 1 be the integer discount-factor. 
DS-comparator for all inequalities and equality are either deterministic safety or 
deterministic co-safety automata with O(u) states. 


As a matter of fact, the most compact non-deterministic DS-comparator con- 
structions with parameters u, d and R also contain O() states [11]. 


3.3 Quantitative Inclusion with Safety/Co-safety Comparators 


This section investigates quantitative language inclusion with regular safety /co- 
safety comparators. Unlike quantitative inclusion with regular comparators, 
quantitative inclusion with regular safety/co-safety comparators is able to cir- 
cumvent Biichi complementation with intermediate subset-construction steps. 
As a result, complexity of quantitative inclusion with regular safety/co-safety 
comparator is lower than the same with regular comparators [12] (Theorem 4). 
Finally, since DS-comparators are regular safety/co-safety comparators, the 
algorithm for quantitative inclusion with regular safety /co-safety comparators 
applies to DS-inclusion yielding a lower complexity algorithm for DS-inclusion 
(Corollary 5). 


Key Ideas A run of word w in a weighted-automaton is mazimal if its weight 
is the supremum weight of all runs of w in the weighted-automaton. A run pp 
of w in P is a counterexample for P C Q (or P C Q) iff there exists a maximal 
run supo of w in Q such that wt(pp) > wt(supg) (or wt(pp) > wt(supg)). 
Consequently, P C Q (or P C Q) iff there are no counterexample runs in P. 
Therefore, the roadmap to solve quantitative inclusion for regular safety /co- 
safety comparators is as follows: 


1. Use regular safety /co-safety comparators to construct the maximal automaton 
of Q i.e. an automaton that accepts all maximal runs of Q (Corollary 3). 

2. Use the regular safety /co-safety comparator and the maximal automaton to 
construct a counterexample automaton that accepts all counterexample runs 
of the inclusion problem P C Q (or P C Q) (Lemma 5). 
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3. Solve quantitative inclusion for safety/co-safety comparator by checking for 
emptiness of the counterexample (Theorem 4). 
Finally, since DS-comparators are regular safety /co-safety automaton (Corol- 
lary 1), apply Theorem 4 to obtain an algorithm for DS-inclusion that uses 
regular safety/co-safety comparators (Corollary 5). 


Let W be a weighted automaton. Then the annotated automaton of W, denoted 
by W, is the Biichi automaton obtained by transforming transition s > t with 
weight v in W to transition s 2", tin W. Observe that W is a safety automaton 
since all its states are accepting. A run on word w with weight sequence wt in 
W corresponds to an annotated word (w, wt) in W, and vice-versa. 


Maximal Automaton. This section covers the construction of the mazimal 
automaton from a weighted automaton. Let W and W bea weighted automaton 
and its annotated automaton, respectively. We call an annotated word (w, wt1) 
in W mazimal if for all other words of the form (w,wt2) in W, wt(wt,) > 
wt(wt2). Clearly, (w, wt) is a maximal word in W iff word w has a run with 
weight sequence wt; in W that is maximal. We define mazimal automaton of 
weighted automaton W, denoted Maximal(W), to be the automaton that accepts 
all maximal words of its annotated automata W. 

We show that when the comparator is regular safety /co-safety, the construc- 
tion of the maximal automata incurs a 2°” blow-up. This section exposes the 
construction for maximal automaton when comparator for non-strict inequality 
is regular safety. The other case when the comparator for strict inequality is 
regular co-safety has been deferred to the appendix. 


Lemma 3. Let W be a weighted automaton with regular safety comparator for 
non-strict inequality. Then the language of Maximal(W) is a safety language. 


Proof (Proof sketch). An annotated word (w, wt) is not maximal in W for one 
of the following two reasons: Either (w, wt,) is not a word in W, or there exists 
another word (w,wtz) in W s.t. wt(wt,) < wt(wtz) (equivalently (wt1, wt2) is 
not in the comparator non-strict inequality). Both W and comparator for non- 
strict inequality are safety languages, so the language of maximal words must 
also be a safety language. 


We now proceed to construct the safety automata for Maximal(W) 


Intuition. The intuition behind the construction of maximal automaton follows 
directly from the definition of maximal words. Let W be the annotated automa- 
ton for weighted automaton W. Let £ denote the alphabet of W. Then an 
annotated word (w, wt) € 3” is a word in Maximal(W) if (a) (w, wt1) € W, 
and (b) For all words (w, wt) € W, wt(wt,) > wt(wta). 

The challenge here is to construct an automaton for condition (b). Intuitively, 
this automaton simulates the following action: As the automaton reads word 
(w, wti), it must spawn all words of the form (w, wt2) in W, while also ensuring 
that wt(wt,) > wt(wt2) holds for every word (w, wtz) in W. Since W is a safety 
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automaton, for a word (w, wt) € X”, all words of the form (w, wt2) € W can be 
traced by subset-construction. Similarly since the comparator C for non-strict 
inequality (>) is a safety automaton, all words of the form (wt,, wt2) € C can be 
traced by subset-construction as well. The construction needs to carefully align 
the word (w, wt1) with the all possible (w, wt2) € W and (wt), wt2) € C. 


Construction of Maximal(W). Let W be a weighted automaton, with annotated 
automaton W and C denote its regular safety comparator for non-strict inequal- 
ity. Let Sw denote the set of states of W (and W) and Sc denote the set of 
states of C. We define Maximal(W) = (S, sz, £, ô, F) as follows: 


— Set of states S consists of tuples of the form (s,X), where s € Sw, and 
X = {(t,o)|t € Sw,c € So} 

- Š is the alphabet of W 

~ Initial state s7 = (sw, {(Sw,c)}), where sw and se are initial states in W and 
C, respectively. 

— Let states (s, X), (s, X’) € S such that X = {(t1,¢1),-.-,(tn,en)} and X’ = 


{(t4,c4),-<+5(thasCin)} - Then (s,X) 2% (s',.X") ed iff 
(a,v) 


1. s —= s’ is a transition in Ww, and 


2. (t,¢5) € oa if there exists (t;, c;) € X, and a weight v’ such that t; == ti 


and c; == c are transitions in W and C, respectively. 


~ (s, {(t1,61),.--, (tn, Cn) }) € F iff s and all t; are accepting in W, and all c; is 
accepting in C. 


Lemma 4. Let W be a weighted automaton with regular safety comparator C 
for non-strict inequality. Then the size of Maximal(W) is |W|-200WLICD, 


Proof (Proof sketch). A state (s,{(t1,¢1),---;(tn;Cn)}) is non-accepting in the 
automata if one of s,t; or cj is non-accepting in underlying automata W and 
the comparator. Since W and the comparator automata are safety, all outgoing 
transitions from a non-accepting state go to non-accepting state in the underly- 
ing automata. Therefore, all outgoing transitions from a non-accepting state in 
Maximal(W) go to non-accepting state in Maximal(W). Therefore, Maximal(W) 
is a safety automaton. To see correctness of the transition relation, one must 
prove that transitions of type (1.) satisfy condition (a), while transitions of type 
(2.) satisfy condition (b). Maximal(W) forms the conjunction of (a) and (b), 
hence accepts the language of maximal words of W. 


A similar construction proves that the maximal automata of weighted 
automata W with regular safety comparator C for strict inequality contains 
|W] - 22UWIICI) states. In this case, however, the maximal automaton may not 
be a safety automaton. Therefore, Lemma 4 generalizes to: 


Corollary 3. Let W be a weighted automaton with regular safety/co- ae com- 
parator C. Then Maximal(W) is a Biichi automaton of size |W] -200W1 ICD, 
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Counterexample Automaton. This section covers the construction of the 
counterexample automaton. Given weighted-automata P and Q, an annotated 
word (w,wtp) in annotated automata Ê is a countererample word of P C Q 
(or P C Q) if there exists (w, wto) in Maximal(Q) s.t. wt(wtp) > wt(wtg) 
(or wt(wtp) > wt(wtg)). Clearly, annotated word (w, wtp) is a counterexample 
word iff there exists a counterexample run of w with weight-sequence wtp in P. 

For this section, we abbreviate strict and non-strict to strct and _ nstrct, 
respectively. For inc € {strct,nstrct}, the counterecample automaton for inc- 
quantitative inclusion, denoted by Counterexample(inc), is the automaton that 
contains all counterexample words of the problem instance. We construct the 
counterexample automaton as follows: 


Lemma 5. Let P, Q be weighted-automata with regular safety/co-safety com- 
parators. For inc € {strct, nstrct}, Counterexample(inc) is a Btichi automaton. 


Proof. We construct Biichi automaton Counterexample(inc) for inc € 
{strct, nstrct} that contains the counterexample words of inc-quantitative inclu- 
sion. Since the comparator are regular safety/co-safety, Maximal(Q) is a Biichi 
automaton (Corollary 3). Construct the product P x Maximal(Q) such that tran- 
sition (p,q) #5 (pi, q2) is in the product iff p, “> pı and qı > qo are 
transitions in P and Maximal(Q), respectively. A state (p, q) is accepting if both 
p and q are accepting in P and Maximal(Q). One can show that the product 
accepts (w, wtp, wto) iff (w, wtp) and (w, wto) are words in P and Maximal(Q), 
respectively. 

If inc = strct, intersect Ê x Maximal(Q) with comparator for >. If inc = 
nstrct, intersect Px Maximal(Q) with comparator for >. Since the comparator 
is a safety or co-safety automaton, the intersection is taken without the cyclic 
counter. Therefore, (s1, t1) 22, (so, tz) is a transition in the intersection iff 
sı 2, s3 and tı => tg are transitions in the product and the appropriate 
comparator, respectively. State (s,t) is accepting if both s and t are accepting. 
The intersection will accept (w, wtp, wto) iff (w,wtp) is a counterexample of 


inc-quantitative inclusion. Counterexample(inc) is obtained by projecting out the 


r S sas 0,010 z a,v 
intersection as follows: Transition m —— n is transformed to m == n. 


Quantitative Inclusion and DS-inclusion. In this section, we give the final 
algorithm for quantitative inclusion with regular safety/co-safety comparators. 
Since DS-comparators are regular safety/co-safety comparators, this gives us an 
algorithm for DS-inclusion with improved complexity than previous results. 


Theorem 4. Let P, Q be weighted-automata with regular safety/co-safety com- 
parators. Let C< and C< be the comparators for < and <, respectively. Then 


— Strict quantitative inclusion P C Q is reduced to emptiness checking of a 
Büchi automaton of size |P||C<||Q| 200A 1C<I), 

— Non-strict quantitative inclusion P C Q is reduced to emptiness checking of 
a Biichi automaton of size |P||Ce||Q| - 2002! 1E<), 
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Proof. Strict and non-strict are abbreviated to strct and nstrct, respectively. 
For inc € {strct, nstrct}, inc-quantitative inclusion holds iff Counterexample(inc) 
is empty. Size of Counterexample(inc) is the product of size of P, Maximal(Q) 
(Corollary 3), and the appropriate comparator as described in Lemma 5. 


In contrast, quantitative inclusion with regular comparators reduces to empti- 
ness of a Biichi automaton with |P] - 20 0P1QICIog(PIIQIICI) states [12]. The 
20(nlogn) blow-up is unavoidable due to Biichi complementation. Hence, quan- 
titative inclusion with regular safety/co-safety has lower worst-case complexity. 

Lastly, we use the results of developed in previous sections to solve DS- 
inclusion. Since DS-comparators are regular safety/co-safety (Corollary 1), an 
immediate consequence of Theorem 4 is an improvement in the worst-case 
complexity of DS-inclusion in comparison to prior results with regular DS- 
comparators. Furthermore, since the regular safety/co-safety DS-comparators 
are of the same size for all inequalities (Theorem 3), we get: 


Corollary 4. Let P, Q be weighted-automata, and C be a regular safety/co- 
safety DS-comparator with integer discount-factor d > 1. Strict DS-inclusion 
reduces to emptiness checking of a safety automaton of size |P||C||Q|-200¢)!C). 


Proof (Proof sketch). When comparator for non-strict inequality is safety- 
automaton, as it is for DS-comparator, the maximal automaton is a safety 
automaton (Lemma 3). One can then show that the counterexample automata 
is also a safety automaton. 


A similar argument proves non-strict DS-inclusion reduces to emptiness of a 
weak-Biichi automaton [27] of size |P||C||Q| -20"@!!C) (see Appendix). 


Corollary 5 ([DS-inclusion with safety/co-safety comparator). Let P, Q be 
weighted-automata, and C be a regular (co)-safety DS-comparator with integer 
discount-factor d > 1.The complexity of DS-inclusion is |P\|C||Q| - 20010), 


4 Implementation and Experimental Evaluation 


The goal of the empirical analysis is to examine performance of DS-inclusion 
with integer discount-factor with safety /co-safety comparators against existing 
tools to investigate the practical merit of our algorithm. We compare against (a) 
Regular-comparator based tool QuIP, and (b) DS-determinization and linear- 
programming tool DetLP. 

QuIP is written in C++, and invokes state-of-the-art Biichi language 
inclusion-solver RABIT [2]. We enable the -fast flag in RABIT, and tune its 
Java-threads with Xss, Xms, Xmx set to 1GB, 1GB and 8GB, respectively. DetLP 
is also written in C++, and uses linear programming solver GLPSOL provided 
by GLPK (GNU Linear Prog. Kit) [1]. We compare these tools along two axes: 
runtime and number of benchmarks solved. 
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Fig. 1. sp = sg on a-axis, wt = 4, ô = 3, d = 3, PC Q 


Implementation Details. The algorithm for strict-DS-inclusion with integer 
discount factor d > 1 proposed in Corollary 4 and non-strict DS-inclusion checks 
for emptiness of the counterexample automata. A naive algorithm will construct 
the counterexample automata fully, and then check if they are empty by ensuring 
the absence of an accepting lasso. 

We implement a more efficient algorithm. In our implementation, we make 
use of the fact that the constructions for DS-inclusion use subset-construction 
intermediate steps. This facilitates an on-the-fly procedure since successor states 
of state in the counterexample automata can be determined directly from input 
weighted automata and the comparator automata. The algorithm terminates as 
soon as an accepting lasso is detected. When an accepting lasso is absent, the 
algorithm traverses all states and edges of the counterexample automata. 

We implement the optimized on-the-fly algorithm in a prototype QuIPFly. 
QuIPFly is written in Python 2.7.12. QuIPFly employs basic implementation-level 
optimizations to avoid excessive re-computation. 


Design and Setup for Experiments. Due to lack of standardized benchmarks 
for weighted automata, we follow a standard approach to performance evaluation 
of automata-theoretic tools [3,30,38] by experimenting with randomly generated 
benchmarks, using random benchmark generation procedure described in [11]. 
The parameters for each experiment are number of states sp and sg of 
weighted automata, transition density 6, maximum weight wt, integer discount- 
factor d, and inc € {strct, nstrct}. In each experiment, weighted automata P and 
Q are randomly generated, and runtime of inc-DS-inclusion for all three tools 
is reported with a timeout of 900s. We run the experiment for each parameter 
tuple 50 times. All experiments are run on a single node of a high-performance 
cluster consisting of two quad-core Intel-Xeon processor running at 2.83 GHz, 
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Fig. 2. sp = sg = 75, wt = 4, ô = 3, d = 3, PC Q 


with 8 GB of memory per node. We experiment with sp = sg ranging from 0- 
1500 in increments of 25, ô € {3,3.5,4}, d = 3, and wt € {d}! +1, d? — 1, dt — 1}. 


Observations and Inferences.' For clarity of exposition, we present the obser- 
vations for only one parameter-tuple. Trends and observations for other param- 
eters were similar. 


QuIPFly Outperforms. QuIP by at least an order of magnitude in runtime. 
Figure | plots the median runtime of all 50 experiments for the given parameter- 
values for QuIP and QuIPFly. More importantly, QuIPFly solves all of our bench- 
marks within a fraction of the timeout, whereas QuIP struggled to solve at least 
50% of the benchmarks with larger inputs (beyond sp = sg = 1000). Primary 
cause of failure is memory overflow inside RABIT. We conclude that regular 
safety /co-safety comparators outperform their regular counterpart, giving credit 
to the simpler subset-constructions vs. Biichi complementation. 


QuIPFly Outperforms. DetLP comprehensively in runtime and in number of 
benchmarks solved. We were unable to plot DetLP in Fig. 1 since it solved fewer 
than 50% benchmarks even with small input instances. Figure2 compares the 
runtime of both tools on the same set of 50 benchmarks for a representative 
parameter-tuple on which all 50 benchmarks were solved. The plot shows that 
QuIPFly beats DetLP by 2—4 orders of magnitude on all benchmarks. 


Overall Verdict. Overall, QuIPFly outperforms QuIP and DetLP by a significant 
margin along both axes, runtime and number of benchmarks solved. This analysis 
gives unanimous evidence in favor of our safety/co-safety approach to solving 
DS-inclusion. 


1 Figures are best viewed online and in color. 


76 S. Bansal and M. Y. Vardi 


5 Concluding Remarks 


The goal of this paper was to build scalable algorithms for DS-inclusion. To 
this end, this paper furthers the understanding of language-theoretic proper- 
ties of discounted-sum aggregate function by demonstrating that DS-comparison 
languages form safety and co-safety languages, and utilizes these properties to 
obtain a decision procedure for DS-inclusion that offers both tighter theoretical 
complexity and improved scalability. All in all, the key insights of this work are: 


1. Pure automata-theoretic techniques of DS-comparator are better for DS- 
inclusion; 

2. In-depth language-theoretic analysis improve both theoretical complexity and 
practical scalability of DS-inclusion; 

3. DS-comparators are compact deterministic safety or co-safety automata. 


To the best of our knowledge, this is the first work that applies language-theoretic 
properties such as safety /co-safety in the context of quantitative reasoning. 

More broadly, this paper demonstrates that the close integration of language- 
theoretic and quantitative properties can render novel algorithms for quantita- 
tive reasoning that can benefit from advances in qualitative reasoning. 
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Abstract. We present algorithms and techniques for the repair of timed system 
models, given as networks of timed automata (NTA). The repair is based on an 
analysis of timed diagnostic traces (TDTs) that are computed by real-time model 
checking tools, such as UPPAAL, when they detect the violation of a timed safety 
property. We present an encoding of TDTs in linear real arithmetic and use the 
MaxSMT capabilities of the SMT solver Z3 to compute possible repairs to clock 
bound values that minimize the necessary changes to the automaton. We then 
present an admissibility criterion, called functional equivalence, that assesses 
whether a proposed repair is admissible in the overall context of the NTA. We 
have implemented a proof-of-concept tool called TARTAR for the repair and 
admissibility analysis. To illustrate the method, we have considered a number of 
case studies taken from the literature and automatically injected changes to clock 
bounds to generate faulty mutations. Our technique is able to compute a feasible 
repair for 91% of the faults detected by UPPAAL in the generated mutants. 


Keywords: Timed automata - Automated repair + Admissibility of repair + 
TARTAR tool 


1 Introduction 


The analysis of system design models using model checking technology is an important 
step in the system design process. It enables the automated verification of system prop- 
erties against given design models. The automated nature of model checking facilitates 
the integration of the verification step into the design process since it requires no further 
intervention of the designer once the model has been formulated and the property has 
been specified. 

Often it is sufficient to abstract from real time aspects when checking system proper- 
ties, in particular when the focus is on functional aspects of the system. However, when 
non-functional properties, such as response times or the timing of periodic behavior, 
play an important role, it is necessary to incorporate real time aspects into the models 
and the specification, as well as to use specialized real-time model checking tools, such 
as UPPAAL [6], Kronos [31] or opaal [11] during the verification step. 

Next to the automatic nature of model checking, the ability to return counterexam- 
ples, in real-time model checking often referred to as timed diagnostic traces (TDT), is 
© The Author(s) 2019 
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a further practical benefit of the use of model checking technology. A TDT describes a 
timed sequence of steps that lead the design model from the initial state of the system 
into a state violating a real-time property. A TDT neither constitutes a causal explana- 
tion of the property violation, nor does it provide hints as to how to correct the model. 
In this paper we describe an automated method that computes proposals for possible 
repairs of a network of timed automata (NTA) that avoid the violation of a timed safety 
property. Consider the TDT depicted as a time annotated sequence diagram [5] in Fig. 1. 
This scenario describes a simple message exchange where the process dbServer 
sends a message req to process db which, after some processing steps returns a mes- 
sage ser to dbServer. Assume a requirement on the system to be that the time from 
sending req to receiving ser is not to be more than 4 time units. Assume that the tim- 
ing interval annotations on the sequence diagram represent the minimum and maximum 
time for the message transmission and processing steps that the NTA, from which the 
diagram has been derived, permits. It is then easy to see that it is possible to execute the 
system in such a way that this property is violated. 
Various changes to the underlying NTA 
model, depicted in Fig. 2, may avoid this prop- 


dbServer db 


erty violation. For instance, the maximum time initial reqAwaiting 

it takes to transmit the req and ser messages 

can be constrained to be at most | time unit, reqCreate [1,2] 
respectively. Alternatively, it may be possible req() 

to avoid the property violation by reducing two reqSent seqhessived 

of the three timings by 0.5 time units. In any reqProcessing [1,1] 
case, proposing such changes to the model may 

either serve to correct clerical mistakes made serReceiving [1,2] 


during the editing of the model, or point to nec- 
essary changes in the dimensioning of its time 
resources, thus contributing to improved design 
space exploration. 

The repair method described in this paper 
relies on an encoding of a TDT as a constraint 
system in linear real arithmetic. This encoding provides a symbolic abstract semantics 
for the TDT by constraining the sojourn time of the NTA in the locations visited along 
the trace. The constraint system is then augmented by auxiliary model variation vari- 
ables which represent syntactic changes to the NTA model, for instance the variation 
of a location invariant condition or a transition guard. We assert that the thus modi- 
fied constraint system implies the non-reachability of a violation. At the same time, we 
assert that the model variation variables have a value that implies that no change of the 
NTA model will occur, for instance by setting a clock bound variation variable to 0. 
This renders the resulting constraint system unsatisfiable. 

In order to compute a repair, we derive a partial MaxSMT instance by turning the 
constraints that disable any repair into soft constraints. We solve this MaxSMT instance 
using the SMT solver Z3 [25]. If the MaxSMT instance admits a solution, the resulting 
model provides values of the model variation variables. These values indicate a repair 


Fig. 1. TDT represented as a sequence 
diagram with timing annotations 
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of the NTA model which entails that along the sequence of locations represented by the 
TDT, the property violation will no longer be reachable. 

In a next step it is necessary to check whether the computed repair is an admissi- 
ble repair in the context of the full NTA. This is important since the repair was com- 
puted locally with respect to only a single given TDT. Thus, it is necessary to define 
a notion of admissibility that is reasonable and helpful in this setting. To this end, we 
propose the notion of functional equivalence which states that as a result of the com- 
puted repair, neither erstwhile existing functional behavior will be purged, nor will new 
functional behavior be added. Functional behavior in this sense is represented by lan- 
guages accepted by the untimed automata of the unrepaired and the repaired NTAs. 
Functional equivalence is then defined as equivalence of the languages accepted by 
these automata. We propose a zone-based automaton construction for implementing the 
functional equivalence test that is efficient in practice. 

We have implemented our proposed method in a proof-of-concept tool called TAR- 
TAR!. Our evaluation of TARTAR is based on several non-trivial NTA models taken 
from the literature, including the frequently considered Pacemaker model [19]. For each 
model, we automatically generate mutants by injecting clock bound variations which we 
then model check using UPPAAL and repair using TARTAR. The evaluation shows that 
our technique is able to compute an admissible repair for 91% of the detected faults. 


Related Work. There are relatively few results available on a formal treatment of TDTs. 
The zone based approach to real-time model checking, which relies on a constraint- 
based abstraction of the state space, is proposed in [14]. The use of constraint solving 
to perform reachability analysis for NTAs is described in [30]. This approach ultimately 
leads to the on-the-fly reachability analysis algorithm used in UPPAAL [7]. [12] defines 
the notion of a time-concrete UPPAAL counterexample. Work documented in [27] 
describes the computation of concrete delays for symbolic TDTs. The above cited 
approaches address neither fault analysis nor repair for TDTs. Our use of MaxSMT 
solvers for computing minimal repairs is inspired by the use MaxSAT solvers for fault 
localization in C programs, which was first explored in the BugAssist tool [20,21]. Our 
approach also shares some similarities with syntax-guided synthesis [2,28], which has 
also been deployed in the context of program repair [22]. One key difference is how we 
determine the admissibility of a repair in the overall system, which takes advantage of 
the semantic restrictions imposed by timed automata. 


Structure of the Paper. We will introduce the automata and real-time concepts needed 
in our analysis in Sect. 2. In Sect. 3 we present the logical formalization of TDTs. The 
repair and admissibility analyses are presented in Sects. 4 and 5, respectively. We report 
on tool development, experimental evaluation and case studies in Sects. 6 and 7 con- 
cludes. 


! TARTAR and links to all models used in this paper can be found at URL https://github.com/ 
sen-uni-kn/tartar. 
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2 Preliminaries 


The timed automaton model that we use in this paper is adapted from [7]. Given a 
set of clocks C, we denote by B(C) the set of all clock constraints over C, which are 
conjunctions of atomic clock constraints of the form c ~ n, where c E€ C, ~E {<,< 
,=,>,>} and n € N. A timed automaton (TA) T is a tuple T = (L,1°,C, ©, 0, 1) 
where L is a finite set of locations, 1° € L is an initial location, C is a finite set of 
clocks, X is a set of action labels, O Cp, L x B(C) x X x 2° x Lisa set of actions, 
and I : L — B(C) denotes a labeling of locations with clock constraints, referred to 
as location invariants. For 0 € O with 0 = (l, g,a, r,l’) we refer to g as the guard of 6 
and to r as its clock resets. 

The operational semantics of T is given by a timed transition system consisting of 
states s = (l, u) where l is a location and u : C — R+ is a clock valuation. The initial 
state so is (Z, uo) where uo maps all clocks to 0. For a clock constraint B we write 
u H| B iff B evaluates to true in u. There are two types of transitions. An action tran- 
sition models the execution of an action whose guard is satisfied. These transitions are 
instantaneous and reset the specified clocks. The passing of time in a location is mod- 
eled by delay transitions. Both types of transitions guarantee that location invariants are 


satisfied in the pre and post state. Formally, we have (l, u) $, (U, u’) iff 

— (action transition) t = (l, g,a,r, l’) € O, u = I(l) Ag, u’ — I(l) and for all clocks 
c E€ C, u'(c) = 0 if c € r and u’ (c) = u(c) otherwise; or 

— (delay transition) t € R4, u = I(l), u’ = I(l) andw’ = u+ t. 


Definition 1. A symbolic timed trace (STT) of T is a sequence of actions S = bo,..., 


On—1. A realization of S is a sequence of delay values ôo, ... , Ôn such that there exists 
. ôi 0; . on 
states $9,.--+,8n,Sn41 with si —>—> 841 for alli € [0,n) and sn —> S41. We 


say that a STT is feasible if it has at least one realization. 


Property Specification. We focus on the analysis of timed safety properties, which we 
characterize by an invariant formula that has to hold for all reachable states of a TA. 
These properties state, for instance, that there are certain locations in which the value of 
a clock variable is not above, equal to or below a certain (integer) bound. Formally, let 
T = (L,1°,C, ©, ©, T) be a TA. A timed safety property IT is a Boolean combination of 
atomic clock constraints and location predicates Ql where | € L. A location predicate 
Ql holds in a state (I’, u) of T iff i’ = l. We say that a STT S witnesses a violation of 
II in T if there exists a realization of S whose induced final state does not satisfy JI. 
We refer to such an STT as a timed diagnostic trace of T for II. 

T satisfies I iff all its reachable states satisfy IJ. This problem can be decided 
using model checking tools such as Kronos [31] and UPPAAL [6]. UPPAAL in par- 
ticular computes a finite abstraction of the state space of an NTA using a zone graph 
construction. Reachability analysis is then performed by an on-the-fly search of the 
zone graph. If the property is violated, the tool generates a feasible TDT that witnesses 
the violation. The objective of our work is to analyze TDTs and to propose repairs for 
the property violation that they represent. We use TDTs generated by the UPPAAL tool 
in our implementation, but we maintain that our results can be adapted to any other tool 
producing TDTs. 
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We further note that UPPAAL takes a network of timed automata (NTA) as input, 
which is a CCS [24] style parallel composition of timed automata T | ... | Tn. Since 
our analysis and repair techniques focus on timing-related errors rather than synchro- 
nization errors, we use TAs rather than NTAs in our formalization. However, our imple- 
mentation works on NTAs. 


Example 1. The running example that we use throughout the paper consists of an NTA 
of two timed automata, depicted in Fig. 2. As alluded to in the introduction, the TAs 
dbServer and db synchronize via the exchange of messages modeled by the pairs of 
send and receive actions req! and req?, respectively, ser! and ser?. The trans- 
mission time of the req message is controlled by the clock variable x and can range 
between 1 and 2 time units. This is achieved by the location invariant x<=2 on the 
reqReceived location in db together with the transition guard x>=1 on the tran- 
sition from reqReceived to reqProcessing. A similar mechanism using clock 
variable z is used to constrain the timing of the transfer of message ser to be within 
1 and 2 time units. The processing time in dbServer is constrained to exactly | time 
unit by the location invariant y<=1 and the transition guard y>=1. In dbServer, a 
transition to location timeout can be triggered when the guard z==2 is satisfied in 
location serReceiving. The clock variable x, which is not reset until the next req 
message is sent, is recording the time that has elapsed since sending req and is used 
in location serReceiving in order to verify if more than 4 time units have passed 
since req was sent. The timed safety property that we will consider for our example 
is IT = sQdbServer.serReceiving V (x < 4). For the violation of this property, 
UPPAAL produces the TDT S = 0o . . . 03 where 


bo = ((initial, reqAwaiting),0,7,0,(reqCreate, reqAwaiting)) 

6, = ((reqCreate, reqAwaiting),0,7, {x}, (reqSent, reqReceived)) 

02 = ((reqSent, reqReceived), {x > 1},7, {y}, (reqSent, reqProc.)) 

63 = ((reqSent, reqProc.), {y > 1},7, {z}, (serReceiving, reqAwait.)). 


3 Logical Encoding of Timed Diagnostic Traces 


Our analysis relies on a logical encoding of TDTs in the theory of quantifier-free linear 
real arithmetic. For the remainder of this paper, we fix a TA T = (L,1°,C, ©, 0,1) 
with a safety property JI and assume that S = 0,...,@n—1 is an STT of T. We use 
the following notation for our logical encoding where j € [0, + 1] is a position in a 
realization of S and c € C is a clock: 


— l; denotes the location of the pre state of 0; for j < n and the location of the post 
state of 0;_; for j = n. 

— cj denotes the value of clock variable c when reaching the state at position 7. 

— ô; denotes the delay of the delay transition leaving the state at position j < n. 

— reset; denotes the set of clock variables that are being reset by action 0; for j < n. 

— ibounds(c,1) denotes the set of pairs (3,~) such that the atomic clock constraint 
c ~ ( appears in the location invariant I(l). 
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initial 


reqCreate timeout 


— reqSent 


(a) Timed Automaton dbServer 


reqAwaiting 


req? 


© reqReceived 
X<=2 
z:=0 X>=1 
y:=0 
D reqProcessing 
y<=1 
(b) Timed Automaton db 


Fig. 2. Network of timed automata - running example 


— gbounds(c, 0) denotes the set of pairs (8, ~) such that the atomic clock constraint 


c ~ p appears in the guard of action 0. 


To illustrate the use of ibounds, assume location | to be labeled with invariants 
x>22Aun<4Ay <1, then ibounds(x,l) = {(2, >), (4, <)}. The usage of gbounds 


is accordingly. 


Definition 2. The timed diagnostic trace constraint system associated with STT S is the 


conjunction T of the following constraints: 


Co = Nao=0 


cEC 
JE[0,n] 


R= \ Cj+1 = 0 


c€reset; , 


D= \ Cj+1 = Cj + 0; 


c€ reset; 


r= A 


(B,~)€ibounds(c,l;) 
G= \ 
(B,~)€gbounds(c,0;) 


L=Gl,A A -Gl 
IAln 


cov BAc+6;~ 8 


cit ~p 


(clock initialization) 


(time advancement) 


(clock resets) 


(sojourn time) 


(location invariants) 


(transition guards) 


(location predicates) 
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Let further ® = IT|¢€n41/¢] where T [cn+1/c] is obtained from IT by substituting 
all occurrences of clocks c € C with cn+1. Then the [1-extended TDT constraint system 
associated with S is defined as T” = T ^ 74. 


To illustrate the encoding consider the transition ©3 of the TDT in Example 1 
corresponding to the transition from state (reqSent, reqProcessing) to state 
(serReceiving, reqgAwaiting) while resetting clock z in the NTA of Fig. 2. The 
encoding for the constraints on the clocks x, y and z is as following: y3 + d3 > 1, 
z4 = 0, z4 = £3 + d3 and y4 = y3 + ds. 


Lemma 1. 65,..., 65 is a realization of an STT S iff there exists a satisfying variable 
assignment t for T such that for all j € [0, n], (0;) = 05. 


Theorem 1. An STT S witnesses a violation of II in T iff T” is satisfiable. 


4 Repair 


We propose a repair technique that analyzes the responsibility of clock bound values 
occurring in a single TDT for causing the violation of a specification JI. The analysis 
suggests possible syntactic repairs. In a second step we define an admissibility test 
that assesses the admissibility of the repair in the context of the complete TA model. 
Throughout this section, we assume that S is a TDT for T and I. 


Clock Bound Variation. We introduce bound variation variables v that stand for correc- 
tion values that the repair will add to the clock bounds occurring in location invariants 
and transition guards. The values are chosen such that none of the realizations of S in 
the modified automaton still witnesses a violation of JI. This is done by defining a new 
constraint system that captures the conditions on the variable v under which the viola- 
tion of JI will not occur in the corresponding trace of the modified automaton. Using 
this constraint system, we then define a maximum satisfiability problem whose solution 
minimizes the number of changes to T that are needed to achieve the repair. 

Recall that the clock bounds occurring in location invariants and in transition guards 
are represented by the ibounds and gbounds sets defined for the TDT S. Notice that 
each clock variable c may be associated with me, different clock bounds in the loca- 


tion invariant of 1, denoted by the set ibounds(c, 1) = {(80", ~@"),..., (BSL met )}. 


Mel? Mel 

Similarly, we enumerate the bounds in gbounds(c, 0) as ( ae me): To reduce nota- 
tional clutter, we let the meta variable r stand for the pairs of the form c, l or c, 6. We 
then introduce bound variation variables vý describing the possible static variation in 
the TA code for the clock bound (7 and modify the TDT constraint system accordingly. 
A variation of the bounds only affects the location invariant constraints Z and the tran- 
sition guard constraints G. We thus define an appropriate invariant variation constraint 
T® and guard variation constraint G?” that capture the clock bound modifications: 


I” = Nek (BEF Of) cj +5; ME (BE + Of) 
(By ,~%.) €ibounds(c,1;) 
g” = /\ cj + ô; (BE + uf) 


(BF 7) Egbounds(c,0;) 


86 M. Kölbl et al. 


We also need constraints ensuring that the modified clock bounds remain positive: 


Mz A Bi + vg 20 
(Bi ~g )Eibounds(c,lj) U gbounds(c,6;) 


Putting all of this together we obtain the bound variation TDT constraint system 
T” =CoNAARADAI™ NG” NZ”NL 


which captures all realizations of S in TAs T™ that are obtained from T by modifying 
the clock bounds 8% by some semantically consistent variations vý. 

Consider the bound variation for the guard y > 1 of transition O3 in Example 1. The 
modified guard constraint, a conjunct in G®, is ys + d3 > 1+ vy. The corresponding 
non-negativity constraint from Z% is 1 + vg > 0. 


Repair by Bound Variation Analysis. The objective of the bound variation analysis is 
to provide hints to the system designer regarding which minimal syntactic changes to 
the considered model might prevent the violation of property JI. Minimality here is 
considered with respect to the number of clock bound values in invariants and guards 
that need to be changed. 

We implement this analysis by using the bound variation TDT constraint system T”” 
to derive an instance of the partial MaxSMT problem whose solutions yield candidate 
repairs for the timed automaton T. The partial MaxSMT problem takes as input a finite 
set of assertion formulas belonging to a fixed first-order theory. These assertions are 
partitioned into hard and soft assertions. The hard assertions Fy are assumed to hold 
and the goal is to find a maximizing subset F’ C Fs of the soft assertions such that 
F' U Fy is satisfiable in the given theory. 

For our analysis, the hard assertions consist of the conjunction 


FÈ = (45;,c).T”) A (Yj, cj. T” > 8). 


Note that the free variables of F ue are exactly the bound variation variables vý. Given 
a satisfying assignment for F®, let T, be the timed automaton obtained from T by 
adding to each clock bound { the according variation value ¿(v ) and let S, be the 
TDT corresponding to S in T,. Then F a guarantees that 


1. S, is feasible, and 
2. S, has no realization that witnesses a violation of I in T,. 


We refer to such an assignment ¿ as a local clock bound repair for T and S. To obtain a 
minimal local clock bound repair, we use the soft assertions given by the conjunction 


F3 = \ vu, = 0. 
(Bi ,-)Eibounds(c,lj) U gbounds(c,6;) 


Clearly Fe ^ Fe is unsatisfiable because 7%” ^ Fe is equisatisfiable with 7, and 
T ^ —@ is satisfiable by assumption. However, if there exists at least one local clock 
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bound repair for T and S, then poy alone is satisfiable. In this case, the MaxSMT 
instance F by UF 4 has at least one solution. Every satisfying assignment of such a 
solution corresponds to a local repair that minimizes the number of clock bounds that 
need to be changed in T. 

Note that hard and soft assertions remain within a decidable logic. Using an SMT 
solver such as Z3, we can enumerate all the optimal solutions for the partial MaxSMT 
instance and obtain a minimal local clock bound repair from each of them. 


Example 2. We have applied the bound variation repair analysis to the TDT from 
Example 1, using TARTAR, which calls Z3. The following repairs were computed: 


Z,l5 


l. vy = -—l. This corresponds to a variation of the location invariant 
regarding clock z in location 5 of the TDT, corresponding to location 
dbServer.serReceiving, to read z < 1 instead of z < 2. This indicates 
that the violation of the bound on the total duration of the transaction, as indicated 
by a return to the serReceiving location and a value greater than 4 for clock x, 
can be avoided by ensuring that the time taken for transmitting the ser message to 
the dbServer is constrained to take exactly 1 time unit. 

2. A further computed repair is vf l yl Interpreting this variation in the context 
of Example 1 means that location db. reqReceived will be left when the clock 
x has value 1. In other words, the transmission of the message req to the db takes 
exactly one time unit, not between 1 and 2 time units as in the unrepaired model. 

3. Another possible repair implies the modification of two clock bounds. This is no 
longer an optimal solution and no further optimal solution exists. Notice that even 
non-optimal solutions might provide helpful insight for the designer, for instance if 
optimal repairs turn out not to be implementable, inadmissible or leading to a prop- 
erty violation. It is therefore meaningful to allow a practical tool implementation to 
compute more than just the optimal repairs. 


5 Admissibility of Repair 


The synthesized repairs that lead to a TA T, change the original TA T in fundamen- 
tal ways, both syntactically and semantically. This brings up the question whether the 
synthesized repairs are admissible. In fact, one of the key questions is what notion of 
admissibility is meaningful in this context. 

A timed trace [7] is a sequence of timed actions € = (t, a1), (t2,a2),... that is 
generated by a run of a TA, where t; < ¢;,, for all i > 1. The timed language for a TA 
T is the set of all its timed traces, which we denote by £r(T). The untimed language 
of T consists of words over T”’s alphabet X so that there exists at least one timed trace 
of T forming this word. Formally, for a timed trace £ = (t1, a1), (t2, a2) ..., the untime 
operator j4(€) returns an untimed trace €,, = a1a2.... We define the untimed language 
£,(T) of the TA T as £,(T) = {u(E) | € € Lr(T)}- 

Let B be a Biichi automaton (BA) [10] over some alphabet X. We write £(B) C XY 
for the language accepted by B. Similarly, we denote by £;(B) C X* the language 
accepted by B if it is interpreted as a nondeterministic finite automaton (NFA). Further, 
we write pref(£(B)) to denote the set of all finite prefixes of words in £(B). 
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For a given NFA or BA M, the closure c1( M) denotes the automaton obtained from 
M by turning all of its states into accepting states. We call M closed iff M = c1(M). 
Notice that a Biichi automaton accepts a safety language if and only if it is closed [1]. 


Admissibility Criteria. From a syntactic point of view the repair obtained from a sat- 
isfying assignment z of the MaxSMT instance ensures that T, is a syntactically valid 
TA model by, for instance, placing non-negativity constraints on repaired clock bounds. 
In case repairs alter right hand sides of clock constraints to rational numbers, this can 
easily be fixed by normalizing all clock constraints in the TA. 

From a semantic perspective, the impact of the repairs is more profound. Since the 
repairs affect time bounds in location invariants and transition guards, as well as clock 
resets, the behavior of T, may be fundamentally different from the behavior of T. 


— First, the computed repair for one property II may render another property IT’ vio- 
lated. To check admissibility of the synthesized repair with respect to the set of all 
properties JI in the system specification, a full re-checking of IJ is necessary. 

— Second, a repair may have introduced zenoness and timelock [4] into T,. As dis- 
cussed in [4], there exists both an over-approximating static test for zenoness as 
well as a model checking based precise test for timelocks that can be used to verify 
whether the repair is admissible in this regard. 

— Third, due to changes in the possible assignment of time values to clocks, reachable 
locations in the TA T may become unreachable in T,, and vice versa. On the one 
hand, this means that some functionalities of the system may no longer be provided 
since part of the actions in T' will no longer be executable in T,, and vice versa. 
Further, a reduction in the set of reachable locations in T, compared to T may mean 
that certain locations with property violations in T' are no longer reachable in T,, 
which implies that certain property violations are masked by a repair instead of 
being fixed. On the other hand, the repair leading to locations becoming reachable 
in T, that were unreachable in T may have the effect that previously unobserved 
property violations become visible and that T, possesses functionality that T’ does 
not have, which may or may not be desirable. 


It should be pointed out that we assess admissibility of a repair leading to T, with respect 
to a given TA model T’, and not with respect to a correct TA model T* satisfying JJ. 


Functional Equivalence. While various variants of semantic admissibility may be con- 
sidered, we are focusing on a notion of admissibility that ensures that a repair does not 
unduly change the functional behavior of the modeled system while adhering to the tim- 
ing constraints of the repaired system. We refer to this as functional equivalence. The 
functional capabilities of a timed system manifest themselves in the sets of action or 
transition traces that the system can execute. For TAs T and T, this means that we need 
to consider the languages over the action or transition alphabets that these TAs define. 
Considering the timed languages of T and T,, we can state that £7(T) 4 Lr(T,) 
since the repair forces at least one timed trace to be purged from £7(T). This means 
that equivalence of the timed languages cannot be an admissibility criterion ensuring 
functional equivalence. At the other end of the spectrum we may relate the de-timed 
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languages of T and T,. The de-time operator a(T) is defined such that it omits all tim- 
ing constraints and resets from any TA T. Requiring L(a(T)) = L(a(T,)) is tempting 
since it states that when eliminating all timing related features from T' and from the 
repaired T,, the resulting action languages will be identical. 

However, this admissibility criterion would be flawed, since the repair in T, may 
imply that unreachable locations in T will be reachable in T,, and vice versa. This may 
have an impact on the untimed languages, and even though L(a(T)) = L(a(T,)) it 
may be that L (T) # L,,(T,). To illustrate this point, consider the running example in 
Fig. 2 and assume the invariant in location dbServer.reqReceiving to be mod- 
ified from z < 2 to z < 1 in the repaired TA T,. Applying the de-time operator to T, 
implies that the location dbServer .timeout, which is unreachable in T,, becomes 
reachable in the de-timed model. Since dbServer.timeout is reachable in T, the 
TA T and T, are not functionally equivalent, even though their de-timed languages are 
identical. Notice that for the untimed languages £, (T) 4 £,,(T,) holds since no timed 
trace in £r(T,) reaches location timeout, even though such a timed trace exists in 
Lr(T). In detail, £ (T) contains the untimed trace O00102030;, that is missing in 
L,,(T;) and where O; is the transition towards the location dbServer . timeout. As 
consequence, we resort to considering the untimed languages of T and T, and require 
La(T) = £,,(7,). It is easy to see that £L (T) = £,(7,) > L(a(T)) = £L(a(T,)). In 
other words, the equivalence of the untimed languages ensures functional equivalence. 


Admissibility Test. Designing an algorithmic admissibility test for functional equiv- 
alence is challenging due to the computational complexity of determining the equiv- 
alence of the untimed languages £,,(7’) and £,,(T,). While language equivalence is 
decidable for languages defined by Biichi Automata, it is undecidable for timed lan- 
guages [3]. For untimed languages, however, this problem is again decidable [3]. The 
algorithmic implementation of the test for functional equivalence that we propose pro- 
ceeds in two steps. 


— First, the untimed languages £,,(T’) and £,,(T,) are constructed. This requires an 
untime transformation of T and T, yielding Biichi automata representing L,,(T) 
and £L, (T,). While the standard untime transformation for TAs [3] relies on a region 
construction, we propose a transformation that relies on a zone construction [14]. 
This will provide a more succinct representation of the resulting untimed languages 
and, hence, a more efficient equivalence test. 

— Second, it needs to be determined whether L, (T) = £,,(T.). As we shall see, the 
obtained Biichi automata are closed. Hence, we can reduce the equivalence prob- 
lem for these w-regular languages to checking equivalence of the regular languages 
obtained by taking the finite prefixes of the traces in £, (T) and £,,(T,). This allows 
us to interpret the Biichi automata obtained in the first step as NFAs, for which the 
language equivalence check is a standard construction [15]. 


Automata for Untimed Languages. The construction of an automaton representing an 
untimed language, here referred to as an untime construction, has so far been proposed 
based on a region abstraction [3]. The region abstraction is known to be relatively inef- 
ficient since the number of regions is, among other things, exponential in the number of 
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clocks [4]. We therefore propose an untime construction based on the construction of 
a zone automaton [14] which in the worst case is of the same complexity as the region 
automaton, but on the average is more succinct [7]. 


Definition 3 (Untimed Biichi Automaton). Assume a TA T and the corresponding 
zone automaton [Tz = (Sz,s%, Xz, Oz). We define the untimed Biichi automaton 
as the closed BA Br = (S, X,—>, So, F) obtained from [T]z such that S = Sz, 
X = Xz \ {ô} and So = {5%}. For every transition in Oz with a label a € X we add 
lÈ (lz SU,’ 
(Uz) V2") 


Ryo}. In addition, we add self-transitions (l, z) > (l, z) to every state (l, z) € Sp. 


a transition to — created by the rule with z? = {v + diw € z,d € 


The following observations justify this definition: 


— A timed trace of T may remain forever in the same location after a finite number of 
action transitions. In order to enable B to accept this trace, we add a self-transition 
labeled with 7 to — for each state s € S in Br, and later define s as accepting. 
These 7-self-transitions extend every finite timed trace t leading to a state in S, to 
an infinite trace t.7”. 

— The construction of the acceptance set F is more intricate. Convergent traces are 
often excluded from consideration in real-time model checking [4]. As a conse- 
quence, in the untime construction proposed in [3], only a subset of the states in S' 
may be included in F. A repair may render a subgraph of the location graph of T 
that is only reachable by divergent traces, into a subgraph in T, that is only reach- 
able by convergent traces. However, excluding convergent traces is only meaning- 
ful when considering unbounded liveness properties, but not when analyzing timed 
safety properties, which in effect are safety properties. As argued in [7], unbounded 
liveness properties appear to be less important than timed safety properties in timed 
systems. This is due to the observation that divergent traces reflect unrealistic behav- 
ior in the limit, but finite prefixes of infinite divergent traces, which only need to be 
considered for timed safety properties, correspond to realistic behavior. This obser- 
vation is also reflected in the way in which, e.g., UPPAAL treats reachability by 
convergent traces. In conclusion, this justifies our choice to define the zone automa- 
ton in the untime construction as a closed BA, i.e., F = S. 


Theorem 2 (Correctness of Untimed Biichi Automaton Construction). For an 
untimed Biichi automaton Br derived from a TA T according to Definition 3 it holds 
that L(Br) = £,(T). 


Equivalence Check for Untimed Languages. Given that the zone automaton construc- 
tion delivers closed BAs we can reduce the admissibility test £, (T) = £„(T,) defined 
over infinite languages to an equivalence test over the finite prefixes of these languages, 
represented by interpreting the zone automata as NFAs. The following theorem justifies 
this reduction. 


Theorem 3 (Language Equivalence of Closed BA). Given closed Biichi automata B 
and B', if L(B) = Ls(B’) then L(B) = L(B’). 
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Discussion. One may want to adapt the admissibility test so that it only considers 
divergent traces, e.g., in cases where only unbounded liveness properties need to be 
preserved by a repair. This can be accomplished as follows. First, an overapproximat- 
ing non-zenoness test [4] can be applied to T and T,. If it shows non-zenoness, then 
one knows that the respective TA does not include convergent traces. If this test fails, 
a more expensive test needs to be developed. It requires a construction of the untimed 
Biichi automata using the approach from [3], and subsequently a language equivalence 
test of the untimed languages accepted by the untimed BAs using, for instance, the 
automata-theoretic constructions proposed in [9]. 


6 Case Studies and Experimental Evaluation 


We have implemented the repair computation and admissibility test in a proof-of- 
concept tool called TARTAR. We present the architecture of TARTAR and then evaluate 
the proposed method by applying TARTAR to several case studies. 


Tool Architecture. The control loop of TARTAR, depicted in Fig. 3, computes repairs 
for a given UPPAAL model and a given property JI using the following steps: 


1. Counterexample Creation. TARTAR calls UPPAAL with parameters to compute and 
store a shortest symbolic TDT in XML format, in case IJ is violated. 

2. Diagnostic Trace Creation. Parsing the model and the TDT, TARTAR creates F a A 
F by as defined in Sect. 4. Z3 can only solve the MaxSMT problem for quantifier-free 
linear real arithmetic. Hence, TARTAR first performs a quantifier elimination on the 
constraints Yô;, cj. T” = & of FX. 

3. Repair Computation. Next, TARTAR attempts to compute a repair, by using Z3 to 
solve the generated quantifier-free MaxSMT instance. In case no solution is found, 
TARTAR terminates. Otherwise, TARTAR returns the repair that has been computed 
from the model of the MaxSMT solution. 

4. Admissibility Check. Using adapted routines provided by the opaal model 
checker [11], TARTAR checks the admissibility of the computed repair. To do so, 
TARTAR modifies the constraints of the considered UPPAAL model as indicated 
by the computed repair. It calls opaal in order to compute the timed transition sys- 
tem (TTS) of the original and the repaired UPPAAL model. TARTAR then checks 
whether the two TTS have equivalent untimed languages, in which case the repair 
is admissible. This check is implemented using the library AutomataLib included in 
the package LearnLib [16], 

5. Iteration. TARTAR is designed to enumerate all repairs, starting with the minimal 
ones, in an iterative loop. To accomplish this, at the end of each iteration 7 a new Vey 
is generated by forcing the bound variation variables that were used in the 2-th repair 
to 0. This excludes the repair computed in iteration 7 from further consideration. 
Using Ws TARTAR iterates back to Step 3 to compute another repair. 
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Evaluation Strategy. The evaluation of our Maturin 1 OAC 


analysis is based on ideas taken from muta- 
tion testing [18]. Mutation testing evaluates — \S/2 Diagnostic Trace Creation kK 
a test set by systematically modifying the 


program code to be tested and computing 3. Repair Computation <a) 
eS. 


the ratio of modifications that are detected ee a 

by the test set. Real-time system models © ~~ solution? 
that contain violations of timed safety prop- 
erties are not available in significant num- 
bers. We therefore need to seed faults in 
existing models and check whether those can be found by our automated repair. An 
objective of mutation testing is that testing a proportion of the possible modification 
yields satisfactory results [18]. In order to evaluate repairs for erroneous clock bounds 
in invariants and transition guards we seed modifications to all bounds of clock con- 
straints by the amount of {—10, —1, +1, +0.1-M, +M}, where M is the maximal 
bound a clock is compared against in a given model. If a thus seeded modification leads 
to a syntactically invalid UPPAAL model, then UPPAAL returns an exception and we 
ignore this modification. In analogy to mutation testing, we compute the count of TDTs 
for which our analysis finds an admissible repair. 


Fig. 3. Control loop of TARTAR 


Experiments. We have applied this modification seeding strategy to eight UPPAAL 
models (see Table 1). Not all of the models that we considered have been published 
with a property that can be violated by mutating a clock constraint. For those models, we 
suggest a suitable timed safety property specifying an invariant condition. In particular, 
we add a property to the Bando [29] model which ensures that, for as long as the sender 
is active, its clock never exceeds the value of 28,116 time units. In the FDDI token 
ring protocol [29], the property that we use checks whether the first member of the ring 
never remains for more than 140 time units in any given state. The Viking model is 
taken from the set of test models of opaal [26]. For this model we use a property that 
checks whether one of the Viking processes can only enter a safe state during the first 
60 time units. Note that all of these properties are satisfied by the unmodified models. 

The results of the clock bound repair computed by TARTAR for all considered mod- 
els are summarized in Table 1. The seeded modifications are characterized quantita- 
tively by the count #Seed of analyzed modified models, the count #TDT of modified 
models that return a TDT for the considered property, the maximal time Typ UPPAAL 
needs to create a TDT per analyzed model, and the length Len. of the longest TDT 
found. For the computation of a repair we give the count #Rep. of all repairs that were 
computed, the count #Adm. of computed admissible repairs, the count of TDTs #Sol. for 
which an admissible repair was found, the maximal time To¢ that the quantifier elimina- 
tion required, the average time effort Tz to compute a repair, the standard deviation SDr 
for the computation time of a repair, the time effort Ta, for an admissibility check, the 
maximal count of variables #Var, and the maximal count of constraints #Con. used in 
Vis. The maximal memory consumption was at most 17MB for the repair analysis and 
478MB for the admissibility test. We performed all experiments on a computer with an 
i7-6700K CPU (4.0GHz), 60 GB of RAM and a Linux operating system. 


Clock Bound Repair for Timed Systems 93 


We found 60 TDTs by seeding violations of the timed safety property and TARTAR 
returned 204 repairs for these TDTs. TARTAR proposed an admissible repair for 55 
(91%) TDTs and at least one repair for 57 (95%) TDTs. For 3 out of the total of 14 TDTs 
found for the SBR model no repair was computed since the timeout of the quantifier 
elimination was reached after 2 minutes. For all other models, no timeout occurred. 

Space limitations do not permit us to describe all models and computed repairs 
in detail, we therefore focus on the pacemaker case study. One of the modification 
increases a location invariant of this model that controls the minimal heart period from 
400 to 1,600. The modification allows the pacemaker to delay an induced ventricular 
beat for too long so that this violates the property that the time between two ventric- 
ular beats of a heart is never longer than the maximal heart period of 1,000. TARTAR 
finds three repairs. Two repairs reduce the maximal time delay between two ventricular 
or articular heart beats of the patient. The repairs are classified as inadmissible. In the 
model context this appears to be reasonable since the repairs would restrict the environ- 
ment of the pacemaker, and not the pacemaker itself. The third repair is admissible and 
reduces the bound modified during the seeding of bound modifications by 600.5. The 
minimal heart period is then below or equal to the maximal heart period of 1, 000. 


Result Interpretation. Our repair strategy minimizes the number of repairs but does 
not optimize the computed value. For instance, in the pacemaker model the computed 
repair of 600.5 would be a correct and admissible repair even if the value was reduced 
to 600, which would be the minimal possible repair value. 

A comparison of the values Toz and Tp reveals that, perhaps unsurprisingly, the 
quantifier elimination step is computationally almost an order of magnitude more 
expensive than the repair computation. Overall, the computational cost (Tog + Tr) cor- 
relates with the number of variables in the constraint system, which depends in turn on 
the length of the TDT and the number of clocks referenced along the TDT. Consider, for 
instance, that the pacemaker model has a TDT of maximal length 9 with 116 variables, 
and the repair requires 0.193 s and 2.070 MB. On the other hand, the Bando model pro- 
duces a longer maximal TDT of length 279 with 1,156 variables and requires 6.555 s 
and 16.650 MB. The impact of the number of clock constraints and clock variables on 
the computation costs can be seen, for instance, in the data for the pacemaker and FDDI 
models. While the pacemaker model has a shorter TDT than the Viking model (9 vs. 
18), the constraint counts (294 vs. 140) of the pacemaker model are higher than for 


Table 1. Experimental results for clock bound repair computation using TARTAR 


Model # Seed | # TDT | Typ Len. | # Rep. | #Adm. | # Sol. | Tog Tr SDR | Tadm # Var. | # Con. 
Repaired db Fig. 2 | 35 6 0.006 s | 4 12 12 6 0.042s | 0.023s | 0.001 | 2.329s | 25 40 
CSMA/CD [17] 90 6 0.012s | 2 36 16 6 0.020s | 0.021s | 0.000 | 3.060s | 16 36 
Elevator [8] 35 3 0.004s | 1 6 6 3 0.071s | 0.028s | 0.005 | 2.374s | 6 16 
Viking 85 3 0.009s | 18 |6 6 3 0.032s | 0.042s | 0.002 | 2.821s | 120 140 
Bando [29] 740 12 0.259 | 279 | 26 24 12 17.2278 | 6.5558 | 1.776 | 4.067s | 1,156 | 2,441 
Pacemaker [19] 240 7 0.044s | 9 34 16 7 0.670s | 0.193s | 0.021 | 3.389s |116 | 294 
SBR [23] 65 14 0.066s | 81 42 26 9 20.776s | 2.568s | 0.441 | 34.120s | 256 | 410 
FDDI [29] 100 9 0.025 s | 5 42 30 9 0.046s | 0.029s | 0.001 | 2.493s | 59 93 
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the Viking model, which coincides with a higher computation time (0.193 s vs. 0.042 s) 
and a higher memory consumption (2.070 MB vs. 0.910 MB) compared to the Viking 
model. 

We analyzed for every TDT the relationship between the length of the TDT and the 
computation time for a repair (T, = Tor + Tr), as well as the relationship between #Var 
and T, by estimating Kendall’s tau [13]. Kendall’s tau is a measurement for the ordinal 
association between two measured quantities. A correlation is considered significant 
if the probability p that there is actually no correlation in a larger data set is below a 
certain threshold. The length of a TDT is significantly related (7, = 0.673, p < .001) 
to T,.. Also #Var is significantly related (T2 = 0.759, p < .001) to Tp. #Var contains 
clocks for every step of a TDT, hence the combination of trace length and clock count 
tends to correlate higher than the trace length on its own. This supports our conjecture 
that the computation time of a repair depends on the trace length and the clock count. 

The admissibility test appears to be quite efficient, with a maximum computation 
time of 34.120 s for the SBR model, which is one of the more complex models that 
were considered. We observed that most models were action-deterministic, which has a 
positive influence on the language equivalence test used during admissibility checking. 


7 Conclusion 


We have presented an approach to derive minimal repairs for timed reachability prop- 
erties of TA and NTA models from TDTs in order to facilitate fault localization and 
debugging of such models during the design process. Our approach includes a for- 
malization of TDTs using linear real arithmetic, a repair strategy based on MaxSMT 
solving, the definition of an admissibility criterion and test for the computed repairs, 
the development of a prototypical analysis and repair tool, and the application of the 
proposed method to a number of case studies of realistic complexity. To the best of our 
knowledge, this is the first rigorous treatment of counterexamples in real-time model 
checking. We are also not aware of any existing repair approaches for TA or NTA mod- 
els. This makes a comparative experimental evaluation impossible. We have nonetheless 
observed that our analysis computes a significant number of admissible repairs within 
realistic computation time bounds and memory consumption. 

Future research will address the development and implementation of repair strate- 
gies for further syntactic features in TAs and NTAs, including false comparison opera- 
tors in invariants and guards, erroneous clock variable references, superfluous or miss- 
ing resets for clocks, and wrong urgent state choices. We will furthermore address the 
interplay between different repairs and develop refined strategies to determine their 
admissibility. Finally, we plan to extend the approach developed in this paper to derive 
criteria for the actual causation of timing property violations in NTA models based on 
the counterfactual reasoning paradigm for causation. 
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Abstract. This paper proposes a sound procedure to verify properties 
of communicating session automata (CSA), i.e., communicating automata 
that include multiparty session types. We introduce a new asynchronous 
compatibility property for CSA, called k-multiparty compatibility (k-Mc), 
which is a strict superset of the synchronous multiparty compatibility 
used in theories and tools based on session types. It is decomposed into 
two bounded properties: (i) a condition called k-safety which guaran- 
tees that, within the bound, all sent messages can be received and each 
automaton can make a move; and (ii) a condition called k-exhaustivity 
which guarantees that all k-reachable send actions can be fired within 
the bound. We show that k-exhaustivity implies existential boundedness, 
and soundly and completely characterises systems where each automaton 
behaves equivalently under bounds greater than or equal to k. We show 
that checking k-Mc is PSPACE-complete, and demonstrate its scalability 
empirically over large systems (using partial order reduction). 


1 Introduction 


Communicating automata are a Turing-complete model of asynchronous interac- 
tions [10] that has become one of the most prominent for studying point-to-point 
communications over unbounded first-in-first-out channels. This paper focuses 
on a class of communicating automata, called communicating session automata 
(CSA), which strictly includes automata corresponding to asynchronous multi- 
party session types [28]. Session types originated as a typing discipline for the 
m-calculus [27,66], where a session type dictates the behaviour of a process wrt. 
its communications. Session types and related theories have been applied to the 
verification and specification of concurrent and distributed systems through their 
integration in several mainstream programming languages, e.g., Haskell [44,55], 
Erlang [49], Ft [48], Go [11,37,38,51], Java [30,31,34,65], OCaml [56], C [52], 
Python [16,47,50], Rust [32], and Scala [61,62]. Communicating automata and 
asynchronous multiparty session types [28] are closely related: the latter can be 
seen as a syntactical representation of the former [17] where a sending state cor- 
responds to an internal choice and a receiving state to an external choice. This 
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correspondence between communicating automata and multiparty session types 
has become the foundation of many tools centred on session types, e.g., for gener- 
ating communication API from multiparty session (global) types [30,31, 48,61], 
for detecting deadlocks in message-passing programs [51,67], and for monitor- 
ing session-enabled programs [5,16,47,49,50]. These tools rely on a property 
called multiparty compatibility [6,18,39], which guarantees that communicating 
automata representing session types interact correctly, hence enabling the iden- 
tification of correct protocols or the detection of errors in endpoint programs. 
Multiparty compatible communicating automata validate two essential require- 
ments for session types frameworks: every message that is sent can be eventually 
received and each automaton can always eventually make a move. Thus, they sat- 
isfy the abstract safety invariant ọ for session types from [63], a prerequisite for 
session type systems to guarantee safety of the typed processes. Unfortunately, 
multiparty compatibility suffers from a severe limitation: it requires that each 
execution of the system has a synchronous equivalent. Hence, it rules out many 
correct systems. Hereafter, we refer to this property as synchronous multiparty 
compatibility (SMC) and explain its main limitation with Example 1. 


Example 1. The system in Fig. 1 contains an interaction pattern that is not sup- 
ported by any definition of smc [6,18,39]. It consists of a client (c), a server (s), 
and a logger (1), which communicate via unbounded FIFO channels. Transition 
sr!a denotes that sender puts (asynchronously) message a on channel sr; and 
transition sr?a denotes the consumption of a from channel sr by receiver. The 
client sends a request and some data in a fire-and-forget fashion, before waiting 
for a response from the server. Because of the presence of this simple pattern, 
the system cannot be executed synchronously (i.e., with the restriction that a 
send action can only be fired when a matching receive is enabled), hence it is 
rejected by all definitions of smc from previous works, even though the system 
is safe (all sent messages are received and no automaton gets stuck). 


Synchronous multiparty compatibility is reminiscent of a strong form of exis- 
tential boundedness. Among the existing sub-classes of communicating automata 
(see [46] for a survey), existentially k-bounded communicating automata [22] 
stand out because they can be model-checked [8,21] and they restrict the model 
in a natural way: any execution can be rescheduled such that the number of 
pending messages that can be received is bounded by k. However, existential 
boundedness is generally undecidable [22], even for a fixed bound k. This short- 
coming makes it impossible to know when theoretical results are applicable. 

To address the limitation of SMc and the shortcoming of existential bound- 
edness, we propose a (decidable) sufficient condition for existential boundedness, 
called k-exhaustivity, which serves as a basis for a wider notion of new compati- 
bility, called k-multiparty compatibility (k-MC) where k € N>o is a bound on the 
number of pending messages in each channel. A system is k-MC when it is (i) 
k-exhaustive, i.e., all kK-reachable send actions are enabled within the bound, and 
(ii) k-safe, i.e., within the bound k, all sent messages can be received and each 
automaton can always eventually progress. For example, the system in Fig. 1 is k- 
multiparty compatible for any k € N.o, hence it does not lead to communication 
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cs!re ? dat ! 
q i cs? data sl!log s1?log 
M: : Bet cs!data Ms : cs?req Kelko Q Mı: 
O< >O ; 
sc?err sc?ok sclok cs?data 


Fig. 1. Client-Server-Logger example. 


errors, see Theorem 1. The k-MC condition is a natural constraint for real-world 
systems. Indeed any finite-state system is k-exhaustive (for k sufficiently large), 
while any system that is not k-exhaustive (resp. k-safe) for any k is unlikely 
to work correctly. Furthermore, we show that if a system of CSA validates k- 
exhaustivity, then each automaton locally behaves equivalently under any bound 
greater than or equal to k, a property that we call local bound-agnosticity. We 
give a sound and complete characterisation of k-exhaustivity for CSA in terms of 
local bound-agnosticity, see Theorem 3. Additionally, we show that the complex- 
ity of checking k-MC is PSPACE-complete (i.e., no higher than related algorithms) 
and we demonstrate empirically that its cost can be mitigated through (sound 
and complete) partial order reduction. 

In this paper, we consider communicating session automata (CSA), which 
cover the most common form of asynchronous multiparty session types [15] (see 
Remark 3), and have been used as a basis to study properties and extensions of 
session types [6,7,18,30,31,41,42,47,49,50]. More precisely, CSA are determin- 
istic automata, whose every state is either sending (internal choice), receiving 
(external choice), or final. We focus on CSA that preserve the intent of internal 
and external choices from session types. In these CSA, whenever an automaton 
is in a sending state, it can fire any transition, no matter whether channels are 
bounded; when it is in a receiving state then at most one action must be enabled. 


Synopsis. In Sect.2, we give the necessary background on communicating 
automata and their properties, and introduce the notions of output/input bound 
independence which guarantee that internal/external choices are preserved in 
bounded semantics. In Sect. 3, we introduce the definition of k-multiparty com- 
patibility (k-MC) and show that k-Mc systems are safe for systems which vali- 
date the bound independence properties. In Sect. 4, we formally relate existen- 
tial boundedness [22,35], synchronisability [9], and k-exhaustivity. In Sect. 5 we 
present an implementation (using partial order reduction) and an experimental 
evaluation of our theory. We discuss related works in Sect.6 and conclude in 
Sect. 7. 

See [43] for a full version of this paper (including proofs and additional exam- 
ples). Our implementation and benchmark data are available online [33]. 


2 Communicating Automata and Bound Independence 


This section introduces notations and definitions of communicating automata 
(following [12,39]), as well as the notion of output (resp. input) bound indepen- 
dence which enforces the intent of internal (resp. external) choice in CSA. 
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Fix a finite set P of participants (ranged | over by p, q, r, s, etc.) and a 
finite alphabet X. The set of channels is C = {pq | p.qeP aud p # qh, 
A Cx {!,?} x X is the set of actions (ranged over by £), X* (resp. A*) is the 
set of finite words on X (resp. A). Let w range over X*, and ¢, w range over A*. 
Also, e (¢ X U A) is the empty word, |w| denotes the length of w, and w-w’ is 
the concatenation of w and w’ (these notations are overloaded for words in A*). 


Definition 1 (Communicating automaton). A communicating automaton 
is a finite transition system given by a triple M = (Q,qo,6) where Q is a finite 
set of states, qo € Q is the initial state, and E QxAxQ is a set of transitions. 


The transitions of a communicating automaton are labelled by actions in A of 
the form sr!a, representing the emission of message a from participant s to r, or 
sr?a representing the reception of a by r. Define subj(pq!a) = subj(qp?a) = p, 
obj (pq!a) = obj(qp?a) = q, and chan(pq!a) = chan(pq?a) = pq. The projection 
of £ onto p is defined as 7,(£) = £ if subj (£) = p and 7,(¢) = € otherwise. Let t 
range over {!, ?}, we define: ma(pa t a) = a and m,(sx ja) = e if either pq # sr 
or t # {’. We extend these definitions to sequences of actions in the natural way. 

A state q E€ Q with no outgoing transition is final; q is sending (resp. receiv- 
ing) if it is not final and all its outgoing transitions are labelled by send 
(resp. receive) actions, and q is mized otherwise. M = (Q,q0,6) is deter- 
ministic if V(¢,4,¢'),(¢40,¢") € 6: L= V = qg = q”. M = (Q,q@,9) 
is send (resp. receive) directed if for all sending (resp. receiving) q € Q and 
(¢,4,¢'),(¢,0',¢") € 6: obj (0) = obj(£’). M is directed if it is send and receive 
directed. 


Remark 1. In this paper, we consider only deterministic communicating 
automata without mixed states, and call them Communicating Session 
Automata (CSA). We discuss possible extensions of our results beyond this class 
in Sect. 7. 


Definition 2 (System). Given a communicating automaton Mp = (Qp, qop, Op) 
for each p € P, the tuple S = (Mp)pep is a system. A configuration of S is a 
pair s = (q;w) where q = (dp)pep with qp E Qp and where w = (Wpq)pgec 
with Wpq E X*; component q is the control state and qp E Qp is the local state of 
automaton Mp. The initial configuration of S is so = (qo; €) where qo = (dop)peP 
and we write e for the |C|-tuple (e,...,€). 


Hereafter, we fix a communicating session automaton Mp = (Qp, qop, Op) for 
each p € P and let S = (Mp)pep be the corresponding sysicin whose initial 
configuration is sọ. For each p € P, we assume that V(q, £, q’) € dp : subj (£) = p. 
We assume that the components of a configuration are named coneiienly: e.g., 
for s' = (q'; w’), we implicitly assume that q’ = (qp)pep and w’ = (wyq)pacc- 


Definition 3 (Reachable configuration). Configuration s’ = (q';w') is 


reachable from configuration s = (q; w) by firing transition 4, written s L g 
(or s > s' when £L is not relevant), if there are s,r € P andae X such that 
either: 
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1. (a) £ = sr!a and (qs, 0,95) € ôs, (b) q, = % for all p # s, (c) Wir = Wer a 
and Wq = Wpq for all pq # sr; or 

2. (a) L= sr?a and (qr, £, qL) € ôr, (b) % = q for all p # r, (c) Wsr = a: why, 
and Wy = Wpq for all pq # sr. 


Remark 2. Hereafter, we assume that any bound k is finite and k € Ny». 


We write —* for the reflexive and transitive closure of —. Configuration 


(q; w) is k-bounded if Vpq € C : |wpg| < k. We write sı — Sn41 when 


t ln ; 
81 => 82°*:8, —> Sn41, for some s2,...,Sn (with n > 0); and say that the 


execution li- -- ln is k-bounded from sı if V1 < i < n+1 : s; is k-bounded. Given 
@ € A*, we write p Z ¢ iff d = do-l-¢, => subj(l) Æ p. We write s Lae s 
if s’ is reachable with a k-bounded execution ¢ from s. The set of reachable 
configurations of S is RS(S) = {s | so -*s}. The k-reachability set of S is 
the largest subset RS;,(S) of RS(S) within which each configuration s can be 
reached by a k-bounded execution from so. 

Definition 4 streamlines notions of safety from previous works [6,12,18,39] 
(absence of deadlocks, orphan messages, and unspecified receptions). 


Definition 4 (k-Safety). S is k-safe if the following holds V(q;w) € RS;(S): 
(ER) Vpq eC, if wp = a- w', then (q; w) 4" PES, 


* 


PG) VpE P, if q is receiving, then (q; w) >k a orqEP andae ». 
P p g q 


We say that S is safe if it validates the unbounded version of k-safety («-safe). 


Property (ER), called eventual reception, requires that any sent message can 
always eventually be received (i.e., if a is the head of a queue then there must 
be an execution that consumes a), and Property (PG), called progress, requires 
that any automaton in a receiving state can eventually make a move (i.e., it can 
always eventually receive an expected message). 

We say that a configuration s is stable iff s = (q;e), i.e., all its queues 
are empty. Next, we define the stable property for systems of communicating 
automata, following the definition from [18]. 


Definition 5 (Stable). S has the stable property (sP) ifVs e RS(S) : 3(q; €) € 
RS(S) : s >*(q;e). 


A system has the stable property if it is possible to reach a stable config- 
uration from any reachable configuration. This property is called deadlock-free 
in [22]. The stable property implies the eventual reception property, but not 
safety (e.g., an automaton may be waiting for an input in a stable configuration, 
see Example 2), and safety does not imply the stable property, see Example 4. 


Example 2. The following system has the stable property, but it is not safe. 


Ms: 


pq!a pq!b Ma : pq?a pq?b qric Mr: qr?c 
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Next, we define two properties related to bound independence. They specify 
classes of CSA whose branching behaviours are not affected by channel bounds. 


Definition 6 (k-oBI). S is oe bound independent (k-OBI), if Vs = 
(q;w) € RS;,(S) and VpeP, ifs Ey, then V(qp, pr!b, dp) E p : S —> k. 


pq!az pua y prle pre pq? az PE rq?d qp!b 
>O 
Mp : pric pqla; Mg rq?d pq?a, Mr: pr?c 
qp?b pee pqlaz qp!b gp!x | pq? az rq!d 
pq!y pq?y 


Fig. 2. Example of a non-IBI and non-safe system. 


Definition 7 (k-1B1). S is k-input bound independent (k-IB1), if Vs = (q; w) € 
RS;,(S) and YpEP, ifs Ea then YLE A:s p A subj (£) = p => £l = qp?a. 


If S is k-OBI, then any automaton that reaches a sending state is able to 
fire any of its available transitions, i.e., sending states model internal choices 
which are not constrained by bounds greater than or equal to k. Note that the 
unbounded version of k-OBI (k = œ) is trivially satisfied for any system due to 
unbounded asynchrony. If S is k-IBI, then any automaton that reaches a receiving 
state is able to fire at most one transition, i.e., receiving states model ezternal 
choices where the behaviour of the receiving automaton is controlled exclusively 
by its environment. We write IBI for the unbounded version of k-IBI (k = 00). 

Checking the IBI property is generally undecidable. However, systems con- 
sisting of (send and receive) directed automata are trivially k-IBI and k-OBI for 
all k, this subclass of CSA was referred to as basic in [18]. We introduce larger 
decidable approximations of IBI with Definitions 10 and 11. 


Proposition 1. (1) If S is send directed, then S is k-OBI for all k e Nso. (2) If 
S is receive directed, then S is IBI (and k-IBI for all k e Nyo). 


Remark 3. CSA validating k-OBI and IBI strictly include the most common forms 
of asynchronous multiparty session types, e.g., the directed CSA of [18], and sys- 
tems obtained by projecting Scribble specifications (global types) which need to 
be receive directed (this is called “consistent external choice subjects” in [31]) and 
which validate 1-OBI by construction since they are projections of synchronous 
specifications where choices must be located at a unique sender. 


3 Bounded Compatibility for CSA 


In this section, we introduce k-multiparty compatibility (k-MC) and study its 
properties wrt. Safety of communicating session automata (CSA) which are k-OBI 
and IBI. Then, we soundly and completely characterise k-exhaustivity in terms 
of local bound-agnosticity, a property which guarantees that communicating 
automata behave equivalently under any bound greater than or equal to k. 
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3.1 Multiparty Compatibility 


The definition of k-Mc is divided in two parts: (i) k-exhaustivity guarantees that 
the set of k-reachable configurations contains enough information to make a 
sound decision wrt. safety of the system; and (ii) k-safety (Definition 4) guaran- 
tees that a subset of all possible executions is free of any communication errors. 
Next, we define k-exhaustivity, then k-multiparty compatibility. Intuitively, a 
system is k-exhaustive if for all k-reachable configurations, whenever a send 
action is enabled, then it can be fired within a k-bounded execution. 


la 'b lb 'b 
Mp: 4r Ma: % Na: ai Ng: = = 
qp?b \ Joala pq?a qp!b pq?a pq?a pa? Jee 


Fig. 3. (Mp, M,) is non-exhaustive, (Mp, Nq) is 1-exhaustive, (Mp, Nj) is 2-exhaustive. 


Definition 8 (k-Exhaustivity). S is k-exhaustive if Y(q; w) € RSk(S) and 
Vp E P, if q is sending, then Y(dp; £; 95) E dp: JQ E A* : (q; w) Bits Apo. 


Definition 9 (k-Multiparty compatibility). S is k-multiparty compatible 
(k-MC) if it is k-safe and k-exhaustive. 


Definition 9 is a natural extension of the definitions of synchronous multi- 
party compatibility given in [18, Definition 4.2] and [6, Definition 4]. The com- 
mon key requirements are that every send action must be matched by a receive 
action (i.e., send actions are universally quantified), while at least one receive 
action must find a matching send action (i.e., receive actions are existentially 
quantified). Here, the universal check on send actions is done via the eventual 
reception property (ER) and the k-exhaustivity condition; while the existential 
check on receive actions is dealt with by the progress property (PG). 

Whenever systems are k-OBI and IBI, then k-exhaustivity implies that k- 
bounded executions are sufficient to make a sound decision wrt. safety. This is 
not necessarily the case for systems outside of this class, see Examples 3 and 5. 


Example 3. The system (Mp, Mg, Mr) in Fig.2 is k-OBI for any k, but not IBI 
(it is 1-IBI but not k-IBI for any k > 2). When executing with a bound strictly 
greater than 1, there is a configuration where M, is in its initial state and both 
its receive transitions are enabled. The system is 1-safe and 1-exhaustive (hence 
1-MC) but it is not 2-exhaustive nor 2-safe. By constraining the automata to 
execute with a channel bound of 1, the left branch of M, is prevented to execute 
together with the right branch of M,. Thus, the fact that the y messages are not 
received in this case remains invisible in 1-bounded executions. This example can 
be easily extended so that it is n-exhaustive (resp. safe) but not n+1-exhaustive 
(resp. safe) by sending/receiving n+1 a; messages. 
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Example 4. The system in Fig. 1 is directed and 1-Mc. The system (Mp, Mq) in 
Fig.3 is safe but not k-Mc for any finite k € N>o. Indeed, for any execution 
of this system, at least one of the queues grows arbitrarily large. The system 
(Mp, Na) is 1-Mc while the system (Mp, Nq) is not 1-MC but it is 2-mc. 


: rs!b > 
rq?z rala. | rdla ps?x 
Mp: Ma: pay Mrz: peto petu pet pera M;: rs?b 


vu ? 
Pa rqiz rsla TSR 


pq!lv  ps!z 


Fig. 4. Example of a system which is not 1-OBI. 


Example 5. The system in Fig. 4 (without the dotted transition) is 1-MC, but not 
2-safe; it is not 1-OBI but it is 2-OBI. In 1-bounded executions, M, can execute 
rs!b-rp!z, but it cannot fire rs!b-rs!a (queue rs is full), which violates the 
1-OBI property. The system with the dotted transition is not 1-OBI, but it is 
2-OBI and k-Mc for any k > 1. Both systems are receive directed, hence IBI. 


Theorem 1. If S is k-OBI, IBI, and k-Mc, then it is safe. 


Remark 4. It is undecidable whether there exists a bound k for which an arbi- 
trary system is k-Mc. This is a consequence of the Turing completeness of com- 
municating (session) automata [10, 20,42]. 


Although the IBI property is generally undecidable, it is possible to identify 
sound approximations, as we show below. We adapt the dependency relation 
from [39] and say that action ¢’ depends on £ from s = (q; w), written sH £< V, 
iff subj (£) = subj(l’) v (chan(£) = chan(l’) A Wehan(e) = €). Action ¢’ depends 
on Lin ¢ from s, written s+ <4 l, if the following holds: 


sLE<gl <= 


(sH EK’ Aske! <yl)vstl<yl ifd=l"-u 
sHl< Ll otherwise 


Definition 10. S is k-chained input bound independent (k-CIBI) if Vs = 
(q;w) € RSk(S) and Yp € P, if s ae, s', then V(qp,sp?b,q,) € dp: SF 
q = =(s A TA (Vode A*: 3! ayo => st qp?a <e sp!b). 

Definition 11. S is k-strong input bound independent (k-SIBI) if Vs = (q; w) € 
RS;,(S) and Yp E P, if s wa s', then Y(qp,sp?b, q) E p: s #q => 


? ! 
(s sp?b : oa Ke sp!b p). 
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Definition 10 requires that whenever p can fire a receive action, at most 
one of its receive actions is enabled at s, and no other receive transition from 
qp will be enabled until p has made a move. This is due to the existence of a 
dependency chain between the reception of a message (qp?a) and the matching 
send of another possible reception (sp!b). Property k-SIBI (Definition 11) is a 
stronger version of k-CIBI, which can be checked more efficiently. 


Lemma 1. If S is k-OBI, k-CIBI (resp. k-SIBI) and k-exhaustive, then it is IBI. 


The decidability of k-OBI, k-IBI, k-SIBI, k-CIBI, and k-MC is straightforward 
since both RS;,(S) (which has an exponential number of states wrt. k) and >x 
are finite, given a finite k. Theorem 2 states the space complexity of the proce- 
dures, except for k-CIBI for which a complexity class is yet to be determined. We 
show that the properties are PSPACE by reducing to an instance of the reacha- 
bility problem over a transition system built following the construction of Bollig 
et al. [8, Theorem 6.3]. The rest of the proof follows from similar arguments in 
Genest et al. [22, Proposition 5.5] and Bouajjani et al. [9, Theorem 3]. 


Theorem 2. The problems of checking the k-OBI, k-IBI, k-SIBI, k-safety, and 
k-exhaustivity properties are all decidable and PSPACE-complete (with k € Nso 
given in unary). The problem of checking the k-CIBI property is decidable. 


3.2 Local Bound-Agnosticity 


We introduce local bound-agnosticity and show that it fully characterises k- 
exhaustive systems. Local bound-agnosticity guarantees that each communicat- 
ing automaton behave in the same manner for any bound greater than or equal to 
some k. Therefore such systems may be executed transparently under a bounded 
semantics (a communication model available in Go and Rust). 


Definition 12 (Transition system). The k-bounded transition system of S is 
the labelled transition system (LTS) TSk(S) = (N, so, A) such that N = RS;,(S), 
so is the initial configuration of S, AC NxAxN is the transition relation, and 


(s,£,8’) € A if and only if s bps. 


Definition 13 (Projection). Let T be an LTS over A. The projection of T 


onto p, written 75(T), is obtained by replacing each label £ in T by m (£). 


Recall that the projection of action /, written 7,(¢), is defined in Sect. 2. 
The automaton 75(7S;,(S)) is essentially the local behaviour of participant p 
within the transition system TS;,(S). When each automaton in a system S$ 
behaves equivalently for any bound greater than or equal to some k, we say 
that S is locally bound-agnostic. Formally, S is locally bound-agnostic for k 
when 75(7'S;,(S)) and ng(TSn(S)) are weakly bisimilar (~) for each participant 
p and any n > k. For k-OBI and IBI systems, local bound-agnosticity is a nec- 
essary and sufficient condition for k-exhaustivity, as stated in Theorem 3 and 
Corollary 1. 
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Theorem 3. Let S be a system. 


(1) If tke Noo: Vpe P : ng(TSk(S)) ~ m5 (TSk41(S)), then S is k-erhaustive. 
(2) If S is k-OBI, IBI, and k-exhaustive, then Wp € P : mg(TSk(S)) ~ 
TS(TS r1 (8). 


Corollary 1. Let S be k-OBI and IBI s.t. Vp E€ P : ng(TSk(S)) ~ n5(TSk+1(8)), 
then S is locally bound-agnostic for k. 


Theorem 3 (1) is reminiscent of the (PSPACE-complete) checking procedure 
for existentially bounded systems with the stable property [22] (an undecidable 
property). Recall that k-exhaustivity is not sufficient to guarantee safety, see 
Examples 3 and 5. We give an effective procedure (based on partial order reduc- 
tion) to check k-exhaustivity and related properties in [43]. 


k-OBI and IBI Communicating Session Automata 


AS-k-bounded (Def. 16) 
4-k-bounded (Def. 15) 
@ k-synchronisable (Def. 17 


Eventual reception 


Fig. 5. Relations between k-exhaustivity, existential k-boundedness, and k-synchronis- 
ability in k-OBI and IBI CSA (the circled numbers refer to Table 1). 


4 Existentially Bounded and Synchronisable Automata 


4.1 Kuske and Muscholl’s Existential Boundedness 


Existentially bounded communicating automata [21,22,35] are a class of com- 
municating automata whose executions can always be scheduled in such a way 
that the number of pending messages is bounded by a given value. Traditionally, 
existentially bounded communicating automata are defined on communicating 
automata that feature (local) accepting states and in terms of accepting runs. 
An accepting run is an execution (starting from so) which terminates in a config- 
uration (q; w) where each œ is a local accepting state. In our setting, we simply 
consider that every local state qp is an accepting state, hence any execution ¢ 
starting from sọ is an accepting run. We first study existential boundedness as 
defined in [35] as it matches more closely k-exhaustivity, we study the “classical” 
definition of existential boundedness [22] in Sect. 4.2. 

Following [35], we say that an execution ¢ € A* is valid if for any prefix ~ 
of ġ and any channel pq € C, we have that 7,(7) is a prefix of 7,,(W), i.e., an 
execution is valid if it models the FIFO semantics of communicating automata. 
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Definition 14 (Causal equivalence [35]). Given ġ, Y E€ A*, we define: d= 
iff @ and w are valid executions and Vp € P : m ($) = m (Y). We write [b] for 


the equivalence class of ọ wrt. =. 


Definition 15 (Existential boundedness [35]). We say that a valid execu- 
tion @ is k-match-bounded if, for every prefix y of o the difference between the 
number of matched events of type pq! and those of type pq? is bounded by k, 
ie, min{|n!a()|, TlO — al) < k 

Write A*|, for the set of k-match-bounded words. An execution ¢ is existentially 
k-bounded if |dlz A A*|, 4 Ø. A system S is existentially k-bounded, written 3- 
k-bounded, if each execution in {@ | 4s: sos} is existentially k-bounded. 


Example 6. Consider Fig.3. (Mp, Mg) is not existentially k-bounded, for any k: 
at least one of the queues must grow infinitely for the system to progress. Systems 
(Mp, Na) and (Mp, Nq) are existentially bounded since any of their executions 
can be scheduled to an ~-equivalent execution which is 2-match-bounded. 


The relationship between k-exhaustivity and existential boundedness is 
stated in Theorem 4 and illustrated in Fig.5 for k-OBI and IBI CSA, where SMC 
refers to synchronous multiparty compatibility [18, Definition 4.2]. The circled 
numbers in the figure refer to key examples summarised in Table 1. The strict 
inclusion of k-exhaustivity in existential k-boundedness is due to systems that 
do not have the eventual reception property, see Example 7. 


Example 7. The system below is 4-1-bounded but is not k-exhaustive for any k. 
Mp: >o sp?c Ms: >x > Mr: so sra 


For any k, the channel sp eventually gets full and the send action sp!b can no 
longer be fired; hence it does not satisfy k-exhaustivity. Note that each execution 
can be reordered into a 1-match-bounded execution (the b’s are never matched). 


Theorem 4. (1) If S is k-OBI, IBI, and k-exhaustive, then it is 3-k-bounded. 
(2) If S is 3-k-bounded and satisfies eventual reception, then it is k-exhaustive. 


4.2 Existentially Stable Bounded Communicating Automata 


The “classical” definition of existentially bounded communicating automata as 
found in [22] differs slightly from Definition 15, as it relies on a different notion 
of accepting runs, see [22, page 4]. Assuming that all local states are accepting, 
we adapt their definition as follows: a stable accepting run is an execution ¢ 
starting from so which terminates in a stable configuration. 


Definition 16 (Existential stable boundedness [22]). A system S is exis- 
tentially stable k-bounded, written 1S-k-bounded, if for each execution @ in 


{4 | I(q;€) € RS(S) : so # (q;€)} there is y such that so »,, with oxy. 
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A system is existentially stable k-bounded if each of its executions leading to 
a stable configuration can be re-ordered into a k-bounded execution (from so). 


Theorem 5. (1) If S is existentially k-bounded, then it is existentially stable 
k-bounded. (2) If S is existentially stable k-bounded and has the stable property, 
then it is existentially k-bounded. 


We illustrate the relationship between existentially stable bounded commu- 
nicating automata and the other classes in Fig.5. The example below further 
illustrates the strictness of the inclusions, see Table 1 for a summary. 


Example 8. Consider the systems in Fig.3. (Mp, Mq) and (Mp, Ng) are (triv- 
ially) existentially stable 1-bounded since none of their (non-empty) executions 
terminate in a stable configuration. The system (Mp, Na) is existentially stable 
2-bounded since each of its executions can be re-ordered into a 2-bounded one. 
The system in Example 7 is (trivially) 4S-1-bounded: none of its (non-empty) 
executions terminate in a stable configuration (the b’s are never received). 


Theorem 6. Let S be an 4(S)-k-bounded system with the stable property, then 
it is k-exhaustive. 


Table 1. Properties for key examples, where direct. stands for directed, OBI for k-OBI, 
SIBI for k-SIBI, ER for eventual reception property, SP for stable property, exh. for k- 
exhaustive, 4(S)-b for 4(S)-bounded, and syn. for n-synchronisable (for some n € N50). 


# System Ref. k |direct./OBI|SIBI|safe|ER |SP exh./4S-b|d-b |syn. 
1 (Me, Ms, Mı) Figure1 |1 lyes_ lyes|yesl|yeslyes yes yes yes |yes yes 
2 (Ms, Mg, Mr) Example 21 |yes |yes|yes|no |yes yes yes yes |yes yes 
3 (Mp, Mg, Mr) Figure 2 > 2!no yes|no no |no no no yes |yes no 
4 (Mp, Ma) Figure 3 |any|yes lyeslyes|yes|yesno no yes |no no 
5 (Mp, Na) Figure 3 2 |yes |yeslyes|yes|yesino yes yes |yes no 
6 (Mp, Ma, Mr, Ms)|Figure 4 2 jno yes|yes|yes|yes|no yes yes |yes|no 
7 (Ms, Mr, Mp) Example 7anylyes |yes|yes|no |no no no yes |yes yes 
8 (Mp, Ma) Example 91 |yes |yeslyes|yes|yes\yes yes yes |yes no 


4.3  Synchronisable Communicating Session Automata 


In this section, we study the relationship between synchronisability [9] and k- 
exhaustivity via existential boundedness. Informally, communicating automata 
are synchronisable if each of their executions can be scheduled in such a way 
that it consists of sequences of “exchange phases”, where each phase consists of 
a bounded number of send actions, followed by a sequence of receive actions. 
The original definition of k-synchronisable systems [9, Definition 1] is based on 
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communicating automata with mailbox semantics, i.e., each automaton has one 
input queue. Here, we adapt the definition so that it matches our point-to-point 
semantics. We write A; for An (C x {!} x X), and A? for An (C x {?} x XD). 


Definition 17 (Synchronisability). A valid execution ¢ = ¢,---¢n is a k- 
exchange if and only if: (1) V1 <i < n: ġ; E AY: A} ^ |¢;| < 2k; and 
(2) VpqeC:Vi<i<n: Tq (Pi) # Toq( Pi) = Vi<j<n: Tq (3) =e. 

We write A*||, for the set of executions that are k-exchanges and say that 
an execution ġ is k-synchronisable if [ġ]> ^ A* |k# Ø. A system S is k- 


synchronisable if each execution in {¢ | 3s : so>s} is k-synchronisable. 


Table 2. Experimental evaluation. |P| is the number of participants, k is the bound, 
|RTS| is the number of transitions in the reduced TS;,(S) (see [43]), direct. stands for 
directed, Time is the time taken to check all the properties shown in this table, and 
GMC is yes if the system is generalised multiparty compatible [39]. 


Example |P| |RTS| | direct. | k-oB1 k-crp1| k-Mc | Time | GMC 


Client-Server-Logger 11 yes yes yes yes |0.04s|no 


4 Player gamet [39] 
Bargain [39] 

Filter collaboration [68] 
Alternating bitt [59] 
TPMContract v2" [25] 
Sanitary agency! [60] 
Logistic! [54] 

Cloud system v4 [24] 
Commit protocol [9] 
Elevator! [9] 
Elevator-dashed? [9] 
Elevator-directed! [9] 
Dev system [58] 
Fibonacci [48] 
Sap-Negot. [48, 53] 


20 no yes yes yes /|0.05s| yes 


8 yes yes |yes yes |0.03s| yes 


10 yes yes yes yes |0.03s| yes 


8 yes yes yes yes |0.04s | no 


14 yes yes |yes yes |0.04s | yes 


34 yes yes |yes yes |0.07s| yes 


26 yes yes yes yes /|0.05s| yes 


16 no yes yes yes |0.04s | yes 


12 yes yes yes yes /|0.03s| yes 


no yes |no yes /|0.14s | no 


80 no yes |no yes |0.16s | no 


41 yes yes |yes yes |0.07s| yes 


20 yes yes yes yes |0.05s|no 


6 yes yes |yes yes |0.03s| yes 


18 yes yes yes yes |0.04s | yes 


SH [48] 30 | yes yes yes |yes |0.06s| yes 
Travel agency [48,64] 21 | yes yes yes |yes |0.05s|yes 
HTTP [29, 48] 48 |yes |yes |yes |yes |0.07s|yes 
SMTP [30, 48] 108 | yes yes yes |yes |0.08s| yes 


gen_server (buggy) [67] 56 =| no no yes |no 0.03 s | no 


gen_server (fixed) [67] 
Double buffering [45] 


45 no yes yes yes |0.03s| yes 


WLW) WDM] DS] WI] Wl] DM) N| BI] wy] anan A| BR) BRB] wo] wo] wo] wl] AI w 
mle ele lel ele ele lel ele ele} mle|elel Reel ele ela 
x 
N 


16 yes yes |yes yes |0.01s|no 


110 J. Lange and N. Yoshida 


Condition (1) says that execution ¢ should be a sequence of an arbitrary 
number of send-receive phases, where each phase consists of at most 2k actions. 
Condition (2) says that if a message is not received in the phase in which it is 
sent, then it cannot be received in ¢. Observe that the bound & is on the number 
of actions (over possibly different channels) in a phase rather than the number 
of pending messages in a given channel. 


Example 9. The system below (left) is 1-MC and 3(S)-1-bounded, but it is not 
k-synchronisable for any k. The subsequences of send-receive actions in the <- 
equivalent executions below are highlighted (right). 


M: la qp?c pq!b qp?d = pala-aqp!c-qp?c-ap!d- pa?a- pa!b - ap? d - pa? 
P „P 2P cpa b aP a ġı = pq!a-qp!c-qp?c-qp!d-pq?a-pq!b- qp?d-pq?b 
Ma: qp!c qp!d pq?a pq?b ġ2 = pq!a-qp!c-qp!d-qp?c- pq?a -: pq!b - qp? d - pq? b 
>+O—>0—>0—>0—>0 ee eee 


Execution ¢, is 1-bounded for sg, but it is not a k-exchange since, e.g., a is 
received outside of the phase where it is sent. In ¢2, message d is received outside 
of its sending phase. In the terminology of [9], this system is not k-synchronisable 
because there is a “receive-send dependency” between the exchange of message 
c and b, i.e., p must receive c before it sends b. Hence, there is no k-exchange 
that is <-equivalent to ¢, and ¢g. 


Theorem 7. (1) If S is k-synchronisable, then it is I-k-bounded. (2) If S is k- 


synchronisable and has the eventual reception property, then it is k-exhaustive. 


Figure 5 and Table 1 summarise the results of Sect. 4 wrt. k-OBI and IBI CSA. 
We note that any finite-state system is k-exhaustive (and 4(S)-k-bounded) for 
sufficiently large k, while this does not hold for synchronisability, see Example 9. 


5 Experimental Evaluation 


We have implemented our theory in a tool [33] which takes two inputs: (i) a 
system of communicating automata and (ii) a bound MAX. The tool iteratively 
checks whether the system validates the premises of Theorem 1, until it succeeds 
or reaches k = MAX. We note that the k-OBI and IBI conditions are required 
for our soundness result (Theorem 1), but are orthogonal for checking k-Mc. 
Each condition is checked on a reduced bounded transition system, called 
RTS;,(S). Each verification procedure for these conditions is implemented in 
Haskell using a simple (depth-first-search based) reachability check on the paths 
of RT'S;,(S). We give an (optimal) partial order reduction algorithm to construct 
RTS;,(S) in [43] and show that it preserves our properties. 

We have tested our tool on 20 examples taken from the literature, which are 
reported in Table 2. The table shows that the tool terminates virtually instan- 
taneously on all examples. The table suggests that many systems are indeed 
k-Mc and most can be easily adapted to validate bound independence. The last 
column refers to the GMC condition, a form of synchronous multiparty compat- 
ibility (SMC) introduced in [39]. The examples marked with t have been slightly 
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modified to make them CSA that validate k-OBI and IBI. For instance, we take 
only one of the possible interleavings between mixed actions to remove mixed 
states (taking send action before receive action to preserve safety), see [43]. 

We have assessed the scalability of our approach with automatically gener- 
ated examples, which we report in Fig. 6. Each system considered in these bench- 
marks consists of 2m (directed) CSA for some m > 1 such that S = (Mp, )i<i<am, 
and each automaton Mp, is of the form (when i is odd): 


PiPi+i! ay PiPi+i! ay Pi+1Pi? ay Pi+1Pi 2 ay 
Mp: oe oc ac 0 < i 
PiPi+ilan PiPi+1!an Pi+1Pi? an Pi+1Pi? an 
k times k times 


Each Mp, sends k messages to participant pi+1, then receives k messages from 
Pi+1- Each message is taken from an alphabet {a;,..., an} (n > 1). Mp, has the 
same structure when 7 is even, but interacts with pi_1 instead. Observe that any 
system constructed in this way is k-MC for any k > 1, n > 1, and m > 1. The 
shape of these systems allows us to assess how our approach fares in the worst 
case, i.e., large number of paths in RTS;(S). Figure6 gives the time taken for 
our tool to terminate (y axis) wrt. the number of transitions in RTS;,(S') where 
k is the least natural number for which the system is k-Mc. The plot on the left 
in Fig. 6 gives the timings when k is increasing (every increment from k = 2 to 
k = 100) with the other parameters fixed (n = 1 and m = 5). The middle plot 
gives the timings when m is increasing (every increment from m = 1 to m = 26) 
with k = 10 and n = 1. The right-hand side plot gives the timings when n is 
increasing (every increment from n = 1 to n = 10) with k = 2 and m = 1. The 
largest RTS;,(S) on which we have tested our tool has 12222 states and 22220 
transitions, and the verification took under 17min.' Observe that partial order 
reduction mitigates the increasing size of the transition system on which k-Mc 
is checked, e.g., these experiments show that parameters k and m have only a 
linear effect on the number of transitions (see horizontal distances between data 
points). However the number of transitions increases exponentially with n (since 
the number of paths in each automaton increases exponentially with n). 


6 Related Work 


Theory of communicating automata Communicating automata were introduced, 
and shown to be Turing powerful, in the 1980s [10] and have since then been 
studied extensively, namely through their connection with message sequence 
charts (MSC) [46]. Several works achieved decidability results by using bag or 
lossy channels [1,2,13,14] or by restricting the topology of the network [36,57]. 

Existentially bounded communicating automata stand out because they pre- 
serve the FIFO semantics of communicating automata, do not restrict the topol- 
ogy of the network, and include infinite state systems. Given a bound k and 


1 All the benchmarks in this paper were run on an 8-core Intel i7-7700 machine with 
16GB RAM running a 64-bit Linux. 
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Fig. 6. Benchmarks: increasing k (left), increasing m (middle), and increasing n (right). 


an arbitrary system of (deterministic) communicating automata S, it is gen- 
erally undecidable whether S is existentially k-bounded. However, the ques- 
tion becomes decidable (PSPACE-complete) when S has the stable property. 
The stable property is itself generally undecidable (it is called deadlock-freedom 
in [22,35]). Hence this class is not directly applicable to the verification of mes- 
sage passing programs since its membership is overall undecidable. We have 
shown that k-OBI, IBI, and k-exhaustive CSA systems are (strictly) included in 
the class of existentially bounded systems. Hence, our work gives a sound prac- 
tical procedure to check whether CSA are existentially k-bounded. To the best of 
our knowledge, the only tools dedicated to the verification of (unbounded) com- 
municating automata are McScM [26] and Chorgram [40]. Bouajjani et al. [9] 
study a variation of communicating automata with mailboxes (one input queue 
per automaton). They introduce the class of synchronisable systems and a pro- 
cedure to check whether a system is k-synchronisable; it relies on executions con- 
sisting of k-bounded exchange phases. Given a system and a bound k, it is decid- 
able (PSPACE-complete) whether its executions are equivalent to k-synchronous 
executions. Section 4.3 states that any k-synchronisable system which satisfies 
eventual reception is also k-exhaustive, see Theorem 7. In contrast to existen- 
tial boundedness, synchronisability does not include all finite-state systems. Our 
characterisation result, based on local bound-agnosticity (Theorem 3), is unique 
to k-exhaustivity. It does not apply to existential boundedness nor synchro- 
nisability, see, e.g., Example 7. The term “synchronizability” is used by Basu 
et al. [3,4] to refer to another verification procedure for communicating automata 
with mailboxes. Finkel and Lozes [19] have shown that this notion of synchroniz- 
ability is undecidable. We note that a system that is safe with a point-to-point 
semantics, may not be safe with a mailbox semantics (due to independent send 
actions), and vice-versa. For instance, the system in Fig. 2 is safe when executed 
with mailbox semantics. 


Multiparty Compatibility and Programming Languages. The first definition of 
multiparty compatibility appeared in [18, Definition 4.2], inspired by the work 
in [23], to characterise the relationship between global types and communicating 
automata. This definition was later adapted to the setting of communicating 
timed automata in [6]. Lange et al. [39] introduced a generalised version of mul- 
tiparty compatibility (GMC) to support communicating automata that feature 


Verifying Asynchronous Interactions via Communicating Session Automata 113 


mixed or non-directed states. Because our results apply to automata without 
mixed states, k-MC is not a strict extension of GMC, and GMC is not a strict 
extension of k-MC either, as it requires the existence of synchronous executions. 
In future work, we plan to develop an algorithm to synthesise representative 
choreographies from k-MC systems, using the algorithm in [39]. 

The notion of multiparty compatibility is at the core of recent works that 
apply session types techniques to programming languages. Multiparty compat- 
ibility is used in [51] to detect deadlocks in Go programs, and in [30] to study 
the well-formedness of Scribble protocols [64] through the compatibility of their 
projections. These protocols are used to generate various endpoint APIs that 
implement a Scribble specification [30,31,48], and to produce runtime monitor- 
ing tools [47,49,50]. Taylor et al. [67] use multiparty compatibility and chore- 
ography synthesis [39] to automate the analysis of the gen_server library of 
Erlang/OTP. We can transparently widen the set of safe programs captured 
by these tools by using k-MC instead of synchronous multiparty compatibility 
(SMC). The k-Mc condition corresponds to a much wider instance of the abstract 
safety invariant y for session types defined in [63]. Indeed k-mc includes SMC 
(see [43]) and all finite-state systems (for k sufficiently large). 


7 Conclusions 


We have studied CSA via a new condition called k-exhaustivity. The k- 
exhaustivity condition is (i) the basis for a wider notion of multiparty compati- 
bility, k-Mc, which captures asynchronous interactions and (ii) the first practi- 
cal, empirically validated, sufficient condition for existential k-boundedness. We 
have shown that k-exhaustive systems are fully characterised by local bound- 
agnosticity (each automaton behaves equivalently for any bound greater than 
or equal to k). This is a key requirement for asynchronous message passing 
programming languages where the possibility of having infinitely many orphan 
messages is undesirable, in particular for Go and Rust which provide bounded 
communication channels. 

For future work, we plan to extend our theory beyond CSA. We believe that it 
is possible to support mixed states and states which do not satisfy IBI, as long as 
their outgoing transitions are independent (i.e., if they commute). Additionally, 
to make k-MC checking more efficient, we will elaborate heuristics to find optimal 
bounds and off-load the verification of k-Mc to an off-the-shelf model checker. 
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Abstract. HyperLTL is an extension of linear-time temporal logic 
for the specification of hyperproperties, i.e., temporal properties that 
relate multiple computation traces. HyperLTL can express information 
flow policies as well as properties like symmetry in mutual exclusion 
algorithms or Hamming distances in error-resistant transmission pro- 
tocols. Previous work on HyperLTL model checking has focussed on 
the alternation-free fragment of HyperLTL, where verification reduces to 
checking a standard trace property over an appropriate self-composition 
of the system. The alternation-free fragment does, however, not cover 
general hyperliveness properties. Universal formulas, for example, can- 
not express the secrecy requirement that for every possible value of a 
secret variable there exists a computation where the value is different 
while the observations made by the external observer are the same. In 
this paper, we study the more difficult case of hyperliveness properties 
expressed as HyperLTL formulas with quantifier alternation. We reduce 
existential quantification to strategic choice and show that synthesis algo- 
rithms can be used to eliminate the existential quantifiers automatically. 
We furthermore show that this approach can be extended to reactive 
system synthesis, i.e., to automatically construct a reactive system that 
is guaranteed to satisfy a given HyperLTL formula. 


1 Introduction 


HyperLTL [6] is a temporal logic for hyperproperties [7], i.e., for properties that 
relate multiple computation traces. Hyperproperties cannot be expressed in stan- 
dard linear-time temporal logic (LTL), because LTL can only express trace prop- 
erties, i.e., properties that characterize the correctness of individual computa- 
tions. Even branching-time temporal logics like CTL and CTL*, which quantify 
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over computation paths, cannot express hyperproperties, because quantifying 
over a second path automatically means that the subformula can no longer refer 
to the previously quantified path. HyperLTL addresses this limitation with quan- 
tifiers over trace variables, which allow the subformula to refer to all previously 
chosen traces. For example, noninterference [21] between a secret input h and 
a public output o can be specified in HyperLTL by requiring that all pairs of 
traces 7 and 7’ that always have the same inputs except for h (i.e., all inputs in 
I \ {h} are equal on 7 and 7’) also have the same output o at all times: 


yry. OC (te =te) > Oe = 0,7) 
i€I\{h} 


This formula states that a change in the secret input h alone cannot cause any 
difference in the output o. 

For certain properties of interest, the additional expressiveness of HyperLTL 
comes at no extra cost when considering the model checking problem. To check 
a property like noninterference, which only has universal trace quantifiers, one 
simply builds the self-composition of the system, which provides a separate copy 
of the state variables for each trace. Instead of quantifying over all pairs of traces, 
it then suffices to quantify over individual traces of the self-composed system, 
which can be done with standard LTL. Model checking universal formulas is 
NLOGSPACE-complete in the size of the system and PSPACE-complete in the 
size of the formula, which is precisely the same complexity as for LTL. 

Universal HyperLTL formulas suffice to express hypersafety properties like 
noninterference, but not hyperliveness properties that require, in general, quanti- 
fier alternation. A prominent example is generalized noninterference (GNI) [27], 
which can be expressed as the following HyperLTL formula: 


Van’ sr”. O(he = hr) A Olor = On) 


This formula requires that for every pair of traces 7 and 7’, there is a third trace 
n” in the system that agrees with m on h and with 7’ on o. The existence of an 
appropriate trace 7” ensures that in 7 and 7’, the value of o is not determined by 
the value of h. Generalized noninterference stipulates that low-security outputs 
may not be altered by the injection of high-security inputs, while permitting non- 
determinism in the low-observable behavior. The existential quantifier is needed 
to allow this nondeterminism. GNI is a hyperliveness property [7] even though 
the underlying LTL formula is a safety property. The reason for that is that we 
can extend any set of traces that violates GNI into a set of traces that satisfies 
GNI, by adding, for each offending pair of traces 7,7’, an appropriate trace 7”. 

Hyperliveness properties also play an important role in applications beyond 
security. For example, robust cleanness [9] specifies that significant differences in 
the output behavior are only permitted after significant differences in the input: 


Var’ dr". (in = in) A (d(on, On’) < ko W d(in, ixr) > ki) 


The differences are measured by a distance function d and compared to con- 
stant thresholds «; for the input and «o for the output. The formula specifies 
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the existence of a trace 7m” that globally agrees with 7’ on the input and where 
the difference in the output o between m and 7” is bounded by ko, unless the 
difference in the input i between 7 and 7” was greater than «;. Robust cleanness, 
thus, forbids unexpected jumps in the system behavior that are, for example, 
due to software doping, while allowing for behavioral differences due to nonde- 
terminism. 

With quantifier alternation, the model checking problem becomes much more 
difficult. Model checking HyperLTL formulas of the form V*i*y, where y is 
a quantifier-free formula, is PSPACE-complete in the size of the system and 
EXPSPACE-complete in the formula. The only known model checking algorithm 
replaces the existential quantifier with the negation of a universal quantifier 
over the negated subformula; but this requires a complementation of the system 
behavior, which is completely impractical for realistic systems. 

In this paper, we present an alternative approach to the verification of hyper- 
liveness properties. We view the model checking problem of a formula of the form 
Var.dr’. p as a game between the V-player and the 4-player. While the V-player 
moves through the state space of the system building trace 7, the J-player must 
match each move in a separate traversal of the state space resulting in a trace 7’ 
such that the pair 7,7’ satisfies y. Clearly, the existence of a winning strategy 
for the J-player implies that Vr.dr’. p is satisfied. The converse is not necessar- 
ily true: Even if there always is a trace 7’ that matches the universally chosen 
trace 7, the 4-player may not be able to construct this trace, because she only 
knows about the choices made by the V-player in the finite prefix of 7 that has 
occurred so far, and not the choices that will be made by the V-player in the 
infinite future. We address this problem by introducing prophecy variables into 
the system. Without changing the behavior of the system, the prophecy vari- 
ables give the +-player the information about the future that is needed to make 
the right choice after seeing only the finite prefix. Such prophecy variables can 
be provided manually by the user of the model checker to provide a lookahead 
on future moves of the V-player. 

This game-theoretic approach provides an opportunity for the user to reduce 
the complexity of the model checking problem: If the user provides a strategy for 
the +-player, then the problem reduces to the cheaper model checking problem for 
universal properties. We show that such strategies can also be constructed auto- 
matically using synthesis. Beyond model checking, the game-theoretic approach 
also provides a method for the synthesis of systems that satisfy a conjunction 
of hypersafety and hyperliveness properties. Here, we do not only synthesize the 
strategy, but also construct the system itself, i.e., the game graph on which the 
model checking game is played. While the synthesis from V*4* hyperproperties 
is known to be undecidable in general, we show that the game-theoretic app- 
roach can naturally be integrated into bounded synthesis, which checks for the 
existence of a correct system up to a bound on the number of states. 


Related Work. While the verification of general HyperLTL formulas has been 
studied before [6,17,18], there has been, so far, no practical model checking 
algorithm for HyperLTL formulas with quantifier alternation. The existing algo- 
rithm involves a complementation of the system automaton, which results in an 
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exponential blow-up of the state space [18]. The only existing model checker for 
HyperLTL, MCHYPER [18], was therefore, so far, limited to the alternation- 
free fragment. Although some hyperliveness properties lie in this fragment, 
quantifier alternation is needed to express general hyperliveness properties like 
GNI. In this paper, we present a technique to model check these hyperliveness 
properties and extend MCHYPER to formulas with quantifier alternation. 

The situation is similar in the area of reactive synthesis. There is a syn- 
thesis algorithm that automatically constructs implementations from HyperLTL 
specifications [13] using the bounded synthesis approach [20]. This algorithm is, 
however, also only applicable to the alternation-free fragment of HyperLTL. In 
this paper, we extend the bounded synthesis approach to HyperLTL formulas 
with quantifier alternation. Beyond the model checking and synthesis problems, 
the satisfiability [11,12,14] and monitoring [15,16,22] problems of HyperLTL 
have also been studied in the past. 

For certain information-flow security policies, there are verification tech- 
niques that use methods related to our model checking and synthesis algorithms. 
Specifically, the self-composition technique [2,3], a construction based on the 
product of copies of a system, has been tailored for various trace-based security 
definitions [10,23,28]. Unlike our algorithms, these techniques focus on specific 
information-flow policies, not on a general logic like HyperLTL. 

The use of prophecy variables [1] to make information about the future acces- 
sible is a known technique in the verification of trace properties. It is, for example, 
used to establish simulation relations between automata [26] or in the verification 
of CTL* properties [8]. 

In our game-theoretic view on the model checking problem for V*3* hyper- 
properties the 4-player has an infinite lookahead. There is some work on finite 
lookahead on trace languages [24]. We use the idea of finite lookahead as an 
approximation to construct existential strategies and give a novel synthesis con- 
struction for strategies with delay based on bounded synthesis [20]. 


2 Preliminaries 


For tuples x € X” and y € X™ over set X, we use æ -y € X"*™ to denote 
the concatenation of x and y. Given a function f: X — Y and a tuple x € X”, 
we define by f ox € Y” the tuple (f(æ[1]),..., f(a[n])). Let AP be a finite set 
of atomic propositions and let X = 24? be the corresponding alphabet. A trace 
t € X” is an infinite sequence of elements of X. We denote a set of traces by 
Tr C XY. We define tli, oo] to be the suffix of t starting at position i > 0. 


HyperLTL. HyperLTL [6] is a temporal logic for specifying hyperproperties. 
It extends LTL by quantification over trace variables 7 and a method to link 
atomic propositions to specific traces. Let V be an infinite set of trace variables. 
Formulas in HyperLTL are given by the grammar 


p = Yr. | Ir.y |Y , and 
Y = ar| W| yvy Ooy] yuy , 
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where a € AP and v € V. We allow the standard boolean connectives ^A, >, +> 
as well as the derived LTL operators release y Rw = 7(>y U ~y), eventually 
Oy = true U y, globally Oy = =Q ~y, and weak until pwy =OyVv(y U 4). 

We call a Q+ Q'ty HyperLTL formula (for Q, Q’ € {V, 3} and quantifier-free 
formula y) alternation-free iff Q = Q’. Further, we say that Q*Q’* has one 
quantifier alternation (or lies in the one-alternation fragment) iff Q 4 Q. 

The semantics of HyperLTL is given by the satisfaction relation F 7, over a 
set of traces Tr C XY. We define an assignment H : YV — X” that maps trace 
variables to traces. I/[7 + t] updates IT by assigning variable 7 to trace t. 


I,i FT ar iff a € H(r)[i] 

H, i Erp =g iff TiFr o 

Titty eVw if I,i Fm yor H, i Err w 

I,i Er Op iff H,i+1 Erm g 

Titty pub if Ij >i, j Em pAvVi<k <j. ken 

I,i Hr, Im. iff there is some t € Tr such that H[r => t], i E mr Y 
I,i Fr, Yr. iff for all ¢ € Tr it holds that H[|r > t], i Emr p 


We write Tr F vy for {},0 Fm p where {} denotes the empty assignment. 
Every hyperproperty is an intersection of a hypersafety and a hyperliveness 
property |7]. A hypersafety property is one where there is a finite set of finite 
traces that is a bad prefix, i.e., that cannot be extended into a set of traces that 
satisfies the hypersafety property. A hyperliveness property is a property where 
every finite set of finite traces can be extended to a possibly infinite set of infinite 
traces such that the resulting trace set satisfies the hyperliveness property. 


Transition Systems. We use transition systems as a model of computation for 
reactive systems. Transition systems consume sequences over an input alphabet 
by transforming their internal state in every step. Let I and O be a finite set 
of input and output propositions, respectively, and let Y = 2/ and I = 2° be 
the corresponding finite alphabets. A T -labeled Y-transition system S is a tuple 
(S, 80,7, l), where S is a finite set of states, so € S is the designated initial state, 
T: Sx VY — S is the transition function, and l: S — T is the state-labeling func- 
tion. We write s > s’ or (s,v,s’) € T if T(s, v) = s’. We generalize the transition 
function to sequences over Y by defining 7*: Y* — S recursively as T*(€) = so 
and T*(v9++:Un—1Un) = T(T*(Vo-+++Un—1), Un) for Vot: Un—-1Un € T+. Given 
an infinite word v = vovi... E€ Y”, the transition system produces an infinite 
sequence of outputs y = Y2... E I”, such that y; = U(7*(v9...u;—1)) for 
every i > 0. The resulting trace p is (vo UYo)(v1 U1)... € X” where we have 
AP = I UO. The set of traces generated by S is denoted by traces(S). Fur- 
thermore, we define € = ({s}, s, Te, le} as the transition system over J = O = 0 
that has only a single trace, that is traces(<) = {0}. For this transition sys- 
tem, 7-(s,0) = s and le(s) = Ø. Given two transition systems S = (S, 59,7, l) 
and S’ = (S’,s),7’,l'), we define S x S’ = (S x S",(s0, sh), T”, U) as the T?- 
labeled Y?-transition system where 7”((s, s’),(v,v’)) = (T(s,v),7/(s’,v’)) and 
W”((s, s’)) = (U(s),U(s’)). A transition system S satisfies a general HyperLTL 
formula y, if, and only if, traces(S) F y. 
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Automata. An alternating parity automaton A over a finite alphabet X is a tuple 
(Q, qo, 6, a), where Q is a finite set of states, qo € Q is the designated initial state, 
ô: Q x X — Bt(Q) is the transition function, and a: Q — C is a function that 
maps states of A to a finite set of colors C C N. For C = {0,1} and C = {1,2}, 
we call A a co-Biichi and Biichi automaton, respectively, and we use the sets 
F C Q and B C Q to represent the rejecting (C = 1) and accepting (C = 2) 
states in the respective automaton (as a replacement of the coloring function a). 
A safety automaton is a Btichi automaton where every state is accepting. The 
transition function ô maps a state q € Q and some a € X to a positive Boolean 
combination of successor states 6(q,a). An automaton is non-deterministic or 
universal if 6 is purely disjunctive or conjunctive, respectively. 

A run of an alternating automaton is a Q-labeled tree. A tree T is a subset 
of NS, such that for every node n € N&, and every positive integer i € Nyo, if 
n-i €T then (i)n € T (ie., T is prefix-closed), and (ii) for every 0 < j < i, 
n-j € T. The root of T is the empty sequence € and for a node n € T, |n| 
is the length of the sequence n, in other words, its distance from the root. 
A run of A on an infinite word p € X® is a Q-labeled tree (T,r) such that 
r(€) = qo and for every node n € T with children n1,...,ngę the following holds: 
1 < k < |Q| and {r(n1),... r(nk)} F lq, plé]), where q = r(n) and i = |n|. A 
path is accepting if the highest color appearing infinitely often is even. A run is 
accepting if all its paths are accepting. The language of A, written L(A), is the 
set {p € X” | A accepts p}. A transition system S is accepted by an automaton 
A, written S F A, if traces(S) C L(A). 


Strategies. Given two disjoint finite alphabets Y and I, a strategy o: Y* => T 
is a mapping from finite histories of Y to I. A transition system S = (S, so, T, 1) 
generates the strategy o if o(v) = l(T*(v)) for every v € Y*. A strategy o is 
called finite-state if there exists a transition system that generates o. 

In the following, we use finite-state strategies to modify the inputs of tran- 
sition systems. Let S = (S,89,7,1) be a transition system over input and out- 
put alphabets Y and I and let o: (Y’)* — Y be a finite-state strategy. Let 
S' = (S’, 59,7’, l’) be the transition system implementing ø, then S || o = S || S’ 
is the transition system (Sx S$’, (so, sh), TIl, Ul} where rll: (Sx 8’) xY! > (Sx S’) 
is defined as T!!((s,s’),u’) = (r(s,I'(s’)),7/(s’,v’)) and Ill: (S x $’) > T is 
defined as I!!(s, s’) = I(s) for every s € S, s € S’, and v' € Y”. 


Model Checking HyperLTL. We recap the model checking of universal Hyper- 
LTL formulas. This case, as well as the dual case of only existential quantifiers, 
is well-understood and, in fact, efficiently implemented in the model checker 
MCHYPER [18]. The principle behind the model checking approach is self- 
composition, where we check a standard trace property on a composition of 
an appropriate number of copies of the given system. 

Let zip denote the function that maps an n-tuple of sequences to a single 
sequence of n-tuples, for example, zip([1, 2, 3], [4,5,6]) = [(1, 4), (2, 5), (8, 6)], and 
let unzip denote its inverse. Given S = (S, so, T, l), the n-fold self-composition of 
S is the transition system S” = (S”, 8b, Tn, ln), where sọ = (S0,.-.,80) € S”, 
T™(S, v) := Tozip(s, v) and ln(s) := los for every s € S” and v € Y”. If traces(S) 
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is the set of traces generated by S, then {zip(pi,..., Pn) | P1,--+5Pn © traces(S)} 
is the set of traces generated by S”. We use the notation zip(y, T1, 72,-.-,7n) for 
some HyperLTL formula y to combine the trace variables 71, 72,..., 7» (occur- 
ring free in y) into a fresh trace variable 7*. 


Theorem 1 (Self-composition for universal HyperLTL formulas [18}). 
For a transition system S and a HyperLTL formula of the form Vm. 
Vag....Vtn. yp it holds that S E VayNqo....Vim. p if S” E Va". 


zip ((p, T1, 12,---,7n)- 


Theorem 2 (Complexity of model checking universal formulas [18]). 
The model checking problem for universal HyperLTL formulas is PSPACE- 
complete in the size of the formula and NEOGSPACE-complete in the size of 
the transition system. 


The complexity of verifying universal HyperLTL formulas is exactly the same 
as the complexity of verifying LTL formulas. For HyperLTL formulas with quan- 
tifier alternations, the model checking problem is significantly more difficult. 


Theorem 3 (Complexity of model checking formulas with one quan- 
tifier alternation [18]). The model checking problem for HyperLTL formulas 
with one quantifier alternation is in EXPSPACE in the size of the formula and 
in PSPACE in the size of the transition system. 


One way to circumvent this complexity is to fix the existential choice and 
strengthen the formula to the universal fragment [9, 13,18]. While avoiding the 
complexity problem, this transformation requires deep knowledge of the system, 
is prone to errors, and cannot be verified automatically as the problem of check- 
ing implications becomes undecidable [11]. In the following section, we present a 
technique that circumvents the complexity problem while still inheriting strong 
correctness guarantees. Further, we provide a method that can, under certain 
restrictions, derive a strategy for the existential choice automatically. 


3 Model Checking with Quantifier Alternations 


3.1 Model Checking with Given Strategies 


Our first goal is the verification of HyperLTL formulas with one quantifier alter- 
nation, i.e., formulas of the form V*A*y or 4*V*y, where ọ is a quantifier-free 
formula. Note that the presented techniques can, similar to skolemization, be 
extended to more than one quantifier alternation. Quantifier alternation intro- 
duces dependencies between the quantified traces. In a V*d*y formula, the 
choices of the existential quantifiers depend on the choices of the universal quan- 
tifiers preceding them. In a formula of the form 3*V*y, however, there has to 
be a single choice for the existential quantifiers that works for all choices of 
the universal quantifiers. In this case, the existentially quantified variables do 
not depend on the universally quantified variables. Hence, the witnesses for the 
existential quantifiers are traces rather than functions that map tuples of traces 
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to traces. As established above, the model checking problem for HyperLTL for- 
mulas with quantifier alternation is known to be significantly more difficult than 
the model checking problem for universal formulas. 

Our verification technique for formulas with quantifier alternation is to sub- 
stitute strategic choice for existential choice. As discussed in the introduction, 
the existence of a strategy implies the existence of a trace. 


Theorem 4 (Substituting Strategic Choice for Existential Choice). Let 
S be a transition system over input alphabet Y. 

It holds that S E VmVrq...Vay,. An da,...dr,. p if there is a strategy o : 
(r")* — Y™ such that S” x (S™ || o) E Va*.zip(y, T1, T2, . -Nn 7,15, ++ Tn) 


It holds that S E Amydm...dtm. VrVah...Val. p if there is a strategy o : 
(Y°)* = Y™ such that (S™ || o) x S? E Va*.zip(y, T1, Ta, --- Tm; Tg) +--+, Th) 


? n 


Proof. Let o be such a strategy, then we define a witness for the existential 


trace quantifiers In| 3r4 . . . Irl, as the sequence of inputs v = vov; ... € (Y™)” 
such that v; = a(ugu}...u;_,) for every i > 0 and every v; € T”; analogously, 
we define a witness for the existential trace quantifiers daydm2...dam as the 
sequence of inputs v = vovi... E€ (Y™”)*” such that v; = a(vov{...vs_,) for 


every i > 0 and every v; € Y°. 


An application of the theorem reduces the verification problem of a HyperLTL 
formula with one quantifier alternation to the verification problem of a universal 
HyperLTL formula. If a sufficiently small strategy can be found, the reduction 
in complexity is substantial: 


Corollary 1 (Model checking with Given Strategies). The model check- 
ing problem for HyperLTL formulas with one quantifier alternation and given 
strategies for the existential quantifiers is in PSPACE in the size of the formula 
and NEOGSPACE in the size of the product of the strategy and the system. 


Note that the converse of Theorem 4 is not in general true. The satisfaction 
of a V*A* HyperLTL formula does not imply the existence of a strategy, because 
at any given point in time the strategy only knows about a finite prefix of the 
universally quantified traces. Consider the formula Vrdr’.Oa, © aq and a 
system that can produce arbitrary sequences of a and ~a. Although the system 
satisfies the formula, it is not possible to give a strategy that allows us to prove 
this fact. Whatever choice our strategy makes, the next move of the V-player can 
make sure that the strategy’s choice was wrong. In the following, we present a 
method that addresses this problem. 


Prophecy Variables. A classic technique for resolving future dependencies 
is the introduction of prophecy variables [1]. Prophecy variables are auxiliary 
variables that are added to the system without affecting the behavior of the 
system. Such variables can be used to make predictions about the future. 

We use prophecy variables to define strategies that depend on the future. In 
the example discussed above, Vrin’.O aq +> ar, the choice of the value of ax in 
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the first position depends on the value of a, in the second position. We introduce 
a prophecy variable p that predicts in the first position whether a, is true in 
the second position. With the prophecy variable, there exists a strategy that 
correctly assigns the value of p whenever the prediction is correct: The strategy 
chooses to set aw if, and only if, p holds. 

Technically, the proof technique introduces a set of fresh input variables P 
into the system. For a I-labeled Y-transition system S = (S, so, T, l), we define 
the I-labeled (T U P)-transition system S? = (S, so, T? ,1) including the inputs 
P where 7”: Sx (YUP) — S. For all s € S and v?” € YUP, r?(s,v") = 7(s, v) 
for v € Y obtained by removing the variables in P from v? (i.e., v =\P uP), 
Moreover, the proof technique modifies the specification so that the original 
property only needs to be satisfied if the prediction is actually correct. We obtain 
the modified specification Yrar’.(pr œ> Oar) > (Oar © ar) in our example. 
The following theorem describes the general technique for one prophecy variable. 


Theorem 5 (Model checking with Prophecy Variables). For a transition 
system S and a quantifier-free formula p, let w be a quantifier-free formula over 
the universally quantified trace variables 1,72... and let p be a fresh atomic 
f 


proposition. It holds that S E YnıYra ... Ynn. Ini Iri... Irl. y if, and only if, 
SÍP} E YriYna .. Ynn. Sr Inh... Iny- Olpr > Y) > y. 


Note that wv is restricted to refer only to universally quantified trace variables. 
Without this restriction, the method would not be sound. In our example, Y = 
a,’ would lead to the modified formula Yrar'.(pr e ar) > (Oar © ar), 
which could be satisfied with the strategy that assigns a,’ to true iff p, is false, 
and thus falsifies the assumption that the prediction is correct, rather than 
ensuring that the original formula is true. 


Proof. It is easy to see that the original specification implies the modified spec- 
ification, since the original formula is the conclusion of the implication. Assume 
that the modified specification holds. Since the prophecy variable p is a fresh 
atomic proposition, and w does not refer to the existentially chosen traces, we 
can, for every choice of the universally quantified traces, always choose the value 
of p such that it guesses correctly, i.e., that p is true whenever w holds. In this 
case, the conclusion and therefore the original specification must be true. 


Unfortunately, prophecy variables do not provide a complete proof technique. 
Consider a system allowing arbitrary sequences of a and b and this specification: 


Yran by N Olby <> Obr) 
A^ (ar > (ar W (bw A ~nar))) 
A (“ar > (ar W (abw A 7a7))) 


Intuitively, 7’ has to be able to predict whether z will stop outputting a at 
an even or odd position of the trace. There is no HyperLTL formula to be 
used as Y% in Theorem 5, because, like LTL, HyperLTL can only express non- 
counting properties. It is worth noting that in our practical experiments, the 
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incompleteness was never a problem. In many cases, it is not even necessary to 
add prophecy variables at all. The presented proof technique is, thus, practically 
useful despite this incompleteness result. 


3.2 Model Checking with Synthesized Strategies 


We now extend the model checking approach with the automatic synthesis of 
the strategies for the existential quantifiers. For a given HyperLTL formula of 
the form V"A’"y and a transition system S, we search for a transition system 
S3 = (X, £o, 1,13), where X is a set of states, rg € X is the designated initial 
state, y: X xY” — X is the transition function, and l4: X — Y”™ is the labeling 
function, such that S” x (S™ || S3) F zip(y). (Since for formulas of the form 
377" the problem only differs in the input of S3, we focus on Y3 HyperLTL.) 


Theorem 6. The strategy realizability problem for V*3* formulas is 2EXP'TIME- 
complete. 


Proof (Sketch). We reduce the strategy synthesis problem to the problem of 
synthesizing a distributed reactive system with a single black-box process. This 
problem is decidable [19] and can be solved in 2EXPTIME. The lower bound 
follows from the LTL realizability problem [30]. 


The decidability result implies that there is an upper bound on the size of 
Sa that is doubly exponential in y. Thus, the bounded synthesis approach [20 
can be used to search for increasingly larger implementations, until a solution is 
found or the maximal bound is reached, yielding an efficient decision procedure 
for the strategy synthesis problem. In the following, we describe this approach 
in detail. 


Bounded Synthesis of Strategies. We transform the synthesis problem into 
an SMT constraint satisfaction problem, where we leave the representation of 
strategies uninterpreted and challenge the solver to provide an interpretation. 
Given a HyperLTL formula Y” J™ p where ¢ is quantifier-free, the model checking 
is based on the product of the n-fold self composition of the transition system 
S, the m-fold self-composition of S where the strategy S3 controls the inputs, 
and the universal co-Btichi automaton A, representing the language L(y) of ¢. 

For a quantifier-free HyperLTL formula p, we construct the universal co- 
Biichi automaton A, such that L( Ap) is the set of words w such that unzip(w) 
y, i.e., the tuple of traces satisfies y. We get this automaton by dualizing the 
non-deterministic Biichi automaton for ~y [6], i.e., changing the branching from 
non-deterministic to universal and the acceptance condition from Biichi to co- 
Büchi. Hence, S satisfies a universal HyperLTL formula Yrı ...Vrn. p if the 
traces generated by the self-composition S” are a subset of L( Ap). 

In more detail, the algorithm searches for a transition system S3 = 
(X, Xo, 4,13) such that the run graph of S”, S™ || S3, and Ap, written 
S” x (S™ || S3) x Ap, is accepting. Formally, given a I-labeled Y-transition 


T 
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system S = (S, so, T,l) and a universal co-Biichi automaton A, = (Q, qo, ô, F}, 
where 6:Qx Y"t™ x P+M _, 2°, the run graph S” x (S™ || S3) x Ay is the 
directed graph (V, E), with the set of vertices V = S” x S™ x X x Q, initial 
vertex Vinit = ((S0,---, 80), (S0,---, S0), £o, qo) and the edge relation E C V x V 
satisfying ((8n,8m,2,q),(s),,8/,,2',q')) E€ E if, and only if 


2 1) A (sm 2 04, ) a (2 5a) 
Tn Tm H 
^ q € ô(q, v . la(x), In(Sn) J lm(Sm))- 
Theorem 7. Given S, S3, and a HyperLTL formula V"A™"p where ọ is 


quantifier-free. Let A, be the universal co-Biichi automaton for p. If the run 
graph S” x (S™ || a x Ay is accepting, then S E Y” Ay. 


u 
Q 
M 
R 
3 

7n 
w 
3 
n 


Proof. Follows from Theorem 4 and the fact that A, represents L(y). 


The acceptance of a run graph is witnessed by an annotation à: V > NU{L} 
which is a function mapping every reachable vertex v € V in the run graph to 
a natural number A(v), i.e., A(v) # L. Intuitively, A(v) returns the number of 
visits to rejecting states on any path from the initial vertex Vinit to v. If we can 
bound this number for every reachable vertex, the annotation is valid and the 
run graph is accepting. Formally, an annotation A is valid, if (1) the initial state 
is reachable (A(vinit) 4 L) and (2) for every (v,v’) € E with A(v) Æ L it holds 
that A(v') Æ L and A(v) > A(v’) where © is > if v’ is rejecting and > otherwise. 
Such an annotation exists if, and only if, the run graph is accepting [20]. 

We encode the search for S3 and the annotation À as an SMT constraint 
system. Therefore, we use uninterpreted function symbols to encode S3 and A. 
A transition system S is represented in the constraint system by two functions, 
the transition function 7: S x Y — S and the labeling function l: S — I’. The 
annotation is split into two parts, a reachability constraint AP: V — B indicating 
whether a state in the run graph is reachable and a counter \*: V — N that 
maps every reachable vertex v to the maximal number of rejecting states A” (v) 
visited by any path from the initial vertex to v. The resulting constraint asserts 
that there is a transition system S3 with an accepting run graph. Note, that the 
functions representing the system S (7: S x T — S and l: S — T) are given, 
that is, they are interpreted. 


AB: SP x S™ x XX Oo BAA: SP xX OO RA KQSN, 
u: X xY” > X.J: X> T” 

Vu € Y”.Ysn, Si, E S”.V8m,3,, E S”. Yq, d E Q.Va,x' € X. 
AP ((s0, -- +, 50), (50; -< - , S0), £0, qo) A 


(>P (sn, 5m, 2,0) A q' € Sla, (v 13(2)); (10 (Sn ` 8m))) Ast! = ula, v) 


Ww Ww 


As! =tm(8n,v) As, = Tm (8m la(x))) 


=> VB(s’ s) xg) A XN (8n, 8m,2,q) © AN(S}, 8! xq) 


Sn? m? 
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where © is > if gq’ € F and > otherwise. The bounded synthesis algorithm 
increases the bound of the strategy S3 until either the constraints system 
becomes satisfiable, or a given upper bound is reached. In the case the constraint 
system is satisfiable, we can extract interpretations for the functions u and la 
using a solver that is able to produce models. These functions then represent 
the synthesized transition system S3. 


Corollary 2. Given S and a HyperLTL formula Y*3*p where y is quantifier- 
free. If the constraint system is satisfiable for some bound on the size of S3 then 
SEV. 


Proof. Follows immediately by Theorem 7. 


As the decision problem is decidable, we know that there is an upper bound on 
the size of a realizing S3 and, thus, the bounded synthesis approach is a decision 
procedure for the strategy realizability problem. 


Corollary 3. The bounded synthesis algorithm decides the strategy realizability 
problem for V*A* HyperLTL. 


Proof. The existence of such an upper bound follows from Theorem 6. 


Approximating Prophecy. We introduce a new parameter to the strategy 
synthesis problem to approximate the information about the future that can be 
captured using prophecy variables. This bound represents a constant lookahead 
into future choices made by the environment. In other words, for a given k > 0, 
the strategy S3 is allowed to depend on choices of the V-player in the next k steps. 
While constant lookahead is only an approximation of infinite clairvoyance, it 
suffices for many practical situations as shown by prior case studies [9, 18]. 

We present a solution to synthesizing transition systems with constant looka- 
head for k > 0 using bounded synthesis. To simplify the presentation, we 
present the stand-alone problem with respect to a specification given as a uni- 
versal co-Biichi automaton. The integration into the constraint system for the 
V*s* HyperLTL synthesis as presented in the previous section is then straight- 
forward. First, we present an extension to the transition system model that 
incorporates the notion of constant lookahead. The idea of this extension is to 
replace the initial state so by a function init: Y* — S that maps input sequences 
of length k to some state. Thus, the transition system observes the first k inputs, 
chooses some initial state based on those inputs, and then progresses with the 
same pace as the input sequence. Next, we define the run graph of such a system 
Sk = (S, init, 7,1) and an automaton A = (Q, qo, ô, F), where 6: QxYxI >Q, 
as the directed graph (V, E) with the set of vertices V = S x Q x Y*, the initial 
vertices (s,qgo,v) E€ V such that s = init(v) for every v € Y*, and the edge 
relation Æ C V x V satisfying ((s,¢,v1v2--- ux), (8',7,U,vh---U,)) € E if, and 
only if 


Uk+ı E Ls ZH SA q € (q, v1, 1(s)) A \ U; = Vi+1- 
1<i<k 
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Lemma 1. Given a universal co-Biichi automaton A and a k-lookahead transi- 
tion system Spk. Sk F A if, and only if, the run graph Sk x A is accepting. 


Finally, synthesis amounts to solving the following constraint system: 


JAP: S x Q x TE> BIAN: Sx Qx Tr’ oN. 

Jinit: Y* + S.3r: S x T S.A: S >T. 

(Vu € Y*. XP (init(v), qo, v)) A 

Yuru- Ung E VEHI. Ys, s E€ S.Yq,qd' € Q. 

(àP (s, q, V1 Up) A S = 7(8,UR41) Ad’ E lq, U1, U(s))) 

=> AB(s' q v2 upp) A AN(s, ¢, 01 wk) & AN(S, ve ++ Ugga) 


Corollary 4. Given some k > 0, if the constraint system is satisfiable for some 
bound on the size of Sk then Sk F A. 


4 Synthesis with Quantifier Alternations 


We now build on the introduced techniques to solve the synthesis problem for 
HyperLTL with quantifier alternation, that is, we search for implementations 
that satisfy the given properties. In previous work [13], the synthesis problem for 
J*V* HyperLTL was solved by a reduction to the distributed synthesis problem. 
We present an alternative synthesis procedure that (1) introduces the necessary 
concepts for the synthesis of the Y*3* fragment and that (2) strictly decomposes 
the choice of the existential trace quantifier from the implementation. 

Fix a formula of the form 4’"V"y. We again reduce the verification problem to 
the problem of determining whether a run graph is accepting. As the existential 
quantifiers do not depend on the universal ones, there is no future dependency 
and thus no need for prophecy variables or bounded lookahead. Formally, S3 is 
a tuple (X, £o, u, l3) such that X is a set of states, zo € X is the designated 
initial state, u: X — X is the transition function, and l3: X — Y" is the 
labeling function. S3 produces infinite sequences of (Y™)”, without having any 
knowledge about the behavior of the universally quantified traces. The run graph 
is then (S™ || S3) x S” x Ay. The constraint system is built analogously to 
Sect. 3.2, with the difference that the representation of the system S is now also 
uninterpreted. In the resulting SMT constraint system, we have two bounds, one 
for the size of the implementation S and one for the size of S3. 


Corollary 5. The bounded synthesis algorithm decides the realizability problem 
for 3+! HyperLTL and is a semi-decision procedure for 3*V>! HyperLTL. 


The synthesis problem for formulas in the V*4* HyperLTL fragment uses the 
same reduction to a constraint system as the strategy synthesis in Sect. 3.2, 
with the only difference that the transition system S itself is uninterpreted. In 
the resulting SMT constraint systems, we have three bounds, the size of the 
implementation S, the size of the strategy S3, and the lookahead k. 
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Fig. 1. HyperLTL model checking with MCHYPER 


Corollary 6. Given a HyperLTL formula V"A™"y where vy is quantifier-free. 
V's" is realizable if the SMT constraint system corresponding to the run graph 
S” x (S™ || S3) x Ay is satisfiable for some bounds on S, S3, and lookahead k. 


5 Implementations and Experimental Evaluation 


We have integrated the model checking technique with a manually provided 
strategy into the HyperLTL hardware model checker MCHypEr'. For the syn- 
thesis of strategies and reactive systems from hyperproperties, we have developed 
a separate bounded synthesis tool based on SMT-solving. In the following, we 
describe these implementations and report on experimental results. All experi- 
ments ran on a machine with dual-core Core i7, 3.3 GHz, and 16 GB memory. 


Hardware Model Checking with Given Strategies. We have extended the 
model checker MCHYPER [18] from the alternation-free fragment to formulas 
with one quantifier alternation. The input to MCHYPER is a circuit description 
as an And-Inverter-Graph in the AIGER format and a HyperLTL formula. Fig- 
ures la and 1 show the model checking process in MCHYPER without and with 
quantifier alternation, respectively. For formulas with quantifier alternation, the 
model checker now also accepts a strategy as an additional AIGER circuit C,. 
Based on this strategy, MCHYPER creates a new circuit where only the inputs of 
the universal system copies are exposed and the inputs of the existential system 


' Try the online tool interface with the latest version of MCHYPER: https://www. 
react.uni-saarland.de/tools/online/MCHyper/. 


Verifying Hyperliveness 135 


Table 1. Experimental results for MCHyYPER on the software doping and mutual exclu- 
sion benchmarks. All experiments used the IC3 option for ABC. Model and property 
names correspond to the ones used in [9] and [18]. 


Model #Latches | Property Time{[s] 
EC 0.05 17 (10.a) + (10.b) 1.8 
EC 0.00625 23 (10.a) + (10.b) 53.4 
AEC 0.05 19 (-10.a) + (=10.b) 2.8 
AEC 0.00625 25 (-10.a) + (10.6) 160.1 
Bakery.a.n.s 47 Sym5 50.6 
Sym6 27.5 
Bakery.a.n.s.5proc | 90 Sym7 461.3 
Syms 472.3 


copies are determined by the strategy. The new circuit is then model checked as 
described in [18] with ABc [4]. 

We evaluate our extension of MCHYPER on formulas with quantifier alter- 
nation based on benchmarks from software doping [9] and symmetry in mutual 
exclusion algorithms [18]. Both considered problems have previously been ana- 
lyzed with MCHyYPER; however, since the properties in both problems require 
quantifier alternation, we were previously limited to a (manually obtained) 
approximation of the properties as universal formulas. The correctness of manual 
approximations is not given but has to be shown separately. By directly model 
checking the formula with quantifier alternation we know that we are checking 
the correct formula without needing any additional proof of correctness. 


Software Doping. D’Argenio et al. [9] examined a clean and a doped version 
of an emission control program of a car and used the previous version of 
MCHYPER to formally verify approximations of these properties. Robust clean- 
ness is expressed in the one-alternation fragment using two V?4! HyperLTL for- 
mulas (given in Prop. 19 in [9], cf. Sect. 1). In [9], the formulas were strength- 
ened into alternation-free formulas that imply the original properties. Despite 
the quantifier alternation, Table 1 shows that the new version of MCHYPER 
verifies the precise formulas in roughly the same time as the alternation-free 
approximations [9] while giving stronger correctness guarantees. 


Symmetry in Mutual Exclusion Protocols. V*3* HyperLTL allows us to specify 
symmetry for mutual exclusion protocols. In such protocols, we wish to guar- 
antee that every request is eventually answered, and the grants are mutually 
exclusive. In our experiments, we used an implementation of the Bakery pro- 
tocol [25]. Table 1 shows the verification results for the precise V13! properties. 
Comparing these results to the performance on the approximations of the sym- 
metry properties [18], we, again, observe that the verification times are similar. 
However, we gain the additional correctness guarantees as described above. 
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Strategy and System Synthesis. For the synthesis of strategies for existen- 
tial quantifiers and for the synthesis of reactive systems from hyperproperties, 
we have developed a separate bounded synthesis tool based on SMT-solving with 
Z3 [29]. Our evaluation is based on two benchmark families, the dining cryptog- 
raphers problem [5] and a simplified version of the symmetry problem in mutual 
exclusion protocols discussed previously. The results are shown in Table 2. Obvi- 
ously, synthesis operates at a vastly smaller scale than model checking with 
given strategies. In the dining cryptographers example, Z3 was unable to find an 
implementation for the full synthesis problem, but could easily synthesize strate- 
gies for the existential trace quantifiers when provided with an implementation. 
With the progress of constraint solver that employ quantification over Boolean 
functions [31] we expect scalability improvements of our synthesis approach. 


Table 2. Summary of the experimental results on the benchmarks sets described in 
Sect.5. When no hyperproperty is given, only the LTL part is used. 


Instance Hyperproperty |S||S3| [Time [s] 
Dining cryptographers distributed + deniability TO 
distributed + deniability with given S|(1)1 1:2 
Mutex — 2 - <1 
symmetry 3 i1 3.4 
Mutex w/o spurious grants — 3 <1 
symmetry 3 1 3.9 
wait-free 3 3 46 
symmetry + wait-free 3 1 +3840 


6 Conclusions 


We have presented model checking and synthesis techniques for hyperliveness 
properties expressed as HyperLTL formulas with quantifier alternation. The 
alternation makes it possible to specify hyperproperties such as generalized non- 
interference, symmetry, and deniability. Our approach is the first method for the 
synthesis of reactive systems from HyperLTL formulas with quantifier alterna- 
tion and the first practical method for the verification of such specifications. 
The approach is based on a game-theoretic view of existential quantifiers, 
where the J-player reacts to decisions of the V-player. The key advantage is that 
the complementation of the system automaton is avoided (cf. [18]). Instead, a 
strategy must be found for the 4-player. Since this can be done either manually or 
through automatic synthesis, the user of the model checking or synthesis tool has 
the opportunity to trade some automation for a significant gain in performance. 
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Abstract. Timing side channels pose a significant threat to the security 
and privacy of software applications. We propose an approach for mitigat- 
ing this problem by decreasing the strength of the side channels as mea- 
sured by entropy-based objectives, such as min-guess entropy. Our goal 
is to minimize the information leaks while guaranteeing a user-specified 
maximal acceptable performance overhead. We dub the decision version 
of this problem Shannon mitigation, and consider two variants, deter- 
ministic and stochastic. First, we show that the deterministic variant is 
NP-hard. However, we give a polynomial algorithm that finds an opti- 
mal solution from a restricted set. Second, for the stochastic variant, we 
develop an approach that uses optimization techniques specific to the 
entropy-based objective used. For instance, for min-guess entropy, we 
used mixed integer-linear programming. We apply the algorithm to a 
threat model where the attacker gets to make functional observations, 
that is, where she observes the running time of the program for the 
same secret value combined with different public input values. Existing 
mitigation approaches do not give confidentiality or performance guar- 
antees for this threat model. We evaluate our tool SCHMIT on a number 
of micro-benchmarks and real-world applications with different entropy- 
based objectives. In contrast to the existing mitigation approaches, we 
show that in the functional-observation threat model, SCHMIT is scalable 
and able to maximize confidentiality under the performance overhead 
bound. 


1 Introduction 


Information leaks through timing side channels remain a challenging problem 
(13, 16,24, 29,35,37,47]. A program leaks secret information through timing side 
channels if an attacker can deduce secret values (or their properties) by observ- 
ing response times. We consider the problem of mitigating timing side channels. 
Unlike elimination techniques [7,31,46] that aim to completely remove timing 
leaks without considering the performance penalty, the goal of mitigation tech- 
niques [10,26,48] is to weaken the leaks, while keeping the penalty low. 

We define the Shannon mitigation problem that decides whether there is 
a mitigation policy to achieve a lower bound on a given security entropy-based 
© The Author(s) 2019 
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measure while respecting an upper bound on the performance overhead. Consider 
an example where the program-under-analysis has a secret variable with seven 
possible values, and has three different timing behaviors, each forming a cluster 
of secret values. It takes 1 second if the secret value is 1, it takes 5 seconds if 
the secret is between 2 and 5, and it takes 10 seconds if the secret value is 6 
or 7. The entropy-based measure quantifies the remaining uncertainty about the 
secret after timing observations. Min-guess entropy [11,25,41] for this program 
is 1, because if the observed execution time is 1, the attacker guesses the secret in 
one try. A mitigation policy involves merging some timing clusters by introducing 
delays. A good solution might be to introduce a 9 second delay if the secret is 1, 
which merges two timing clusters. But, this might be disallowed by the budget 
on the performance overhead. Therefore, another solution must be found, such 
as introducing a 4 seconds delay when the secret is one. 

We develop two variants of the Shannon mitigation problem: deterministic 
and stochastic. The mitigation policy of the deterministic variant requires us 
to move all secret values associated to an observation to another observation, 
while the policy of the stochastic variant allows us to move only a portion of 
secret values in an observation to another one. We show that the deterministic 
variant of the Shannon mitigation problem is intractable and propose a dynamic 
programming algorithm to approximate the optimal solution for the problem 
by searching through a restricted set of solutions. We develop an algorithm 
that reduces the problem in the stochastic variant to a well-known optimization 
problem that depends on the entropy-based measure. For instance, with min- 
guess entropy, the optimization problem is mixed integer-linear programming. 

We consider a threat model where an attacker knows the public inputs 
(known-message attacks [26]), and furthermore, where the public input changes 
much more often than the secret inputs (for instance, secrets such as bank 
account numbers do not change often). As a result, for each secret, the attacker 
observes a timing function of the public inputs. We call this model functional 
observations of timing side channels. 

We develop our tool SCHMIT that has three components: side channel dis- 
covery [45], search for the mitigation policy, and the policy enforcement. The 
side channel discovery builds the functional observations [45] and measures the 
entropy of secret set after the observations. The mitigation policy component 
includes the implementation of the dynamic programming and optimization 
algorithms. The enforcement component is a monitoring system that uses the 
program internals and functional observations to enforce the policy at runtime. 
To summarize, we make the following contributions: 


— We formalize the Shannon mitigation problem with two variants and show 
that the complexity of finding deterministic mitigation policy is NP-hard. 

— We describe two algorithms for synthesizing the mitigation policy: one is 
based on dynamic programming for the deterministic variant, that is in poly- 
nomial time and results in an approximate solution, and the other one solves 
the stochastic variant of the problem with optimization techniques. 
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— We consider a threat model that results in functional observations. On a set 
of micro-benchmarks, we show that existing mitigation techniques are not 
secure and efficient for this threat model. 

— We evaluate our approach on five real-world Java applications. We show that 
SCHMIT is scalable in synthesizing mitigation policy within a few seconds and 
significantly improves the security (entropy) of the applications. 


Example(int high, int low) { 


int t_high = high, t_low = low; 
while (t_high > 0) { 2] 
if (thigh % 2 == 1) { Éo] 
while (t_low > 0) { go 
if (t_low % 2 == 1) { ES] 
res += compute(t_low,t_high);} 9] 
t_low = t_low >> 1;}} Z 
t_high = t_high >> 1;} og z 5 5 To 
return res;} Public key (ordered by num. set bits) 


Fig. 1. (a) The example used in Sect. 2. (b) The timing functions for each secret value 
of the program. 


2 Overview 


First, we describe the threat model considered in this paper. Second, we 
describe our approach on a running example. Third, we compare the results 
of SCHMIT with the existing mitigation techniques [10,26,48] and show that 
SCHMIT achieves the highest entropy (i.e., best mitigation) for all three entropy 
objectives. 


Threat Model. We assume that the attacker has access to the source code 
and the mitigation model, and she can sample the run-time of the application 
arbitrarily many times on her own machine. During an attack, she intends to 
guess a fixed secret of the target machine by observing the mitigated running 
time. Since we consider the attack models where the attacker knows the public 
inputs and the secret inputs are less volatile than public inputs, her observations 
are functional observations, where for each secret value, she learns a function 
from the public inputs to the running time. 


Example 2.1. Consider the program shown in Fig. 1(a). It takes secret and 
public values as inputs. The running time depends on the number of set bits 
in both secret and public inputs. We assume that secret and public inputs can 
be between 1 and 1023. Figure 1(b) shows the running time of different secret 
values as timing functions, i.e., functions from the public inputs to the running 
time. 
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Side channel discovery. One can use existing tools to find the initial functional 
observations [44,45]. In Example 2.1, functional observations are F = (y, 2y, 
...,l0y) where y is a variable whose value is the number of set bits in the 
public input. The corresponding secret classes after this observation is Sf = 
(11, 12,13,..-,110) where 1n shows a set of secret values that have n set bits. 
The sizes of classes are B = {10, 45, 120, 210, 252, 210, 120, 45, 10, 1}. We use L1- 
norm as metric to calculate the distance between the functional observations 
F. This distance (penalty) matrix specifies extra performance overhead to move 
from one functional observation to another. With the assumption of uniform 
distributions over the secret input, Shannon entropy, guessing entropy, and the 
min-guessing entropy are 7.3, 90.1, and 1.0, respectively. These entropies are 
defined in Sect.3 and measure the remaining entropy of the secret set after 
the observations. We aim to maximize the entropy measures, while keeping the 
performance overhead below a threshold, say 60% for this example. 


Mitigation with Schmit. We use our tool SCHMIT to mitigate timing leaks of 
Example 2.1. The mitigation policy for the Shannon entropy objective is shown 
in Fig. 2(a). The policy results in two classes of observations. The policy requires 
to move functional observations (y,2y,...,5y) to (6y) and all other observations 
(7y, 8y, 9y) to (10y). To enforce this policy, we use a monitoring system at run- 
time. The monitoring system uses a decision tree model of the initial functional 
observations. The decision tree model characterizes each functional observation 
with associated program internals such as method calls or basic block invoca- 
tions [43,44]. The decision tree model for the Example 2.1 is shown in Fig. 2(b). 
The monitoring system records program internals and matches it with the deci- 
sion tree model to detect the current functional observation. Then, it adds delays, 
if necessary, to the execution time in order to enforce the mitigation policy. With 
this method, the mitigated functional observation is G = (6y, 10y) and the secret 


(10,10) = 1.0 È| 


po(a2, 10) = 1 of 
@ u(7,10) = 1.0 


modExp_bblock_16 


So 

A 

<=50*y >50*y É 
=O 
modExp_bblock_16 modExp_bblock_16 ae 
<= 4.0"y >40*y <=6.0% >6.0xy FQ 

.(6,6) = 1.0 

60-0 aoe EO Gee s 


(1,6) = Lo} ; 
@ (1,6) = 1.0 7 Y 2 4 6 8 TO 


Public key (ordered by num. set bits) 


Fig. 2. (a) Mitigation policy calculation with deterministic algorithm (left). The obser- 
vations x1 and x2 stands for all observations from C2—Cs and from Cs—C%, resp.; (b) 
Leaned discriminant decision tree (center): it characterizes the functional clusters of 
Fig. 1(b) with internals of the program in Fig. 1(a); and (c) observations (right) after 
the mitigation by SCHMIT results in two classes of observations. 
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class is Sg = ({11, lo, ls, l4, 1s, le}, {17, lg, lg, lio}) as shown in Fig. 2 (c). The 
performance overhead of this mitigation is 43.1%. The Shannon, guessing, and 
min-guess entropies have improved to 9.7, 459.6, and 193.5, respectively. 


Comparison with state of the art. We compare our mitigation results to 
black-box mitigation scheme [10] and bucketing [26]. Black-box double scheme 
technique. We use the double scheme technique [10] to mitigate the leaks 
of Example 2.1. This mitigation uses a prediction model to release events 
at scheduled times. Let us consider the prediction for releasing the event 
i at N-th epoch with S(N,i) = max(inpi, S(N,i—1))+p(N), where inp; is 
the time arrival of the i-th request, S(N,i — 1) is the prediction for the 
request i—1, and p(N) = 27! models the basis for the prediction scheme 
at N-th epoch. We assume that the request are the same type and the 
sequence of public input requests for each secret are received in the begin- 
ing of epoch N = 1. Figure3(a) shows the functional observations after 
applying the predictive mitigation. With this mitigation, the classes of obser- 
vations are Sg = (11, {12,13}, {14, 15, 16, 17}, {1s, 19, lio}). The number of 
classes of observations is reduced from 10 to 4. The performance overhead 
is 39.9%. The Shannon, guessing, and min-guess entropies have increased 
to 9.00, 321.5, and 5.5, respectively. Bucketing. We consider the mitiga- 
tion approach with buckets [26]. For Example 2.1, if the attacker does not 
know the public input (unknown-message attacks [26]), the observations are 
{1.1,2.1,3.3,--- ,9.9,10.9,--- , 109.5} as shown in Fig. 3(b). We apply the buck- 
eting algorithm in [26] for this observations, and it finds two buckets {37.5, 109.5} 
shown with the red lines in Fig. 3(b). The bucketing mitigation requires to 
move the observations to the closet bucket. Without functional observations, 
there are 2 classes of observations. However, with functional observations, there 
are more than 2 observations. Figure3(c) shows how the pattern of observa- 
tions are leaking through functional side channels. There are 7 classes of obser- 
vations: Sg = lis lo, 13}, {la}, {1s}, {le}, {17}, {1g}, {lo}, {lio} The Shan- 
non, guessing, and min-guess entropies are 7.63, 102.3, and 1.0, respectively. 


Time (ms 
50 ind 150 
Time (ms. 

0 20 40 AH bo 
Time (ms 
60 g 101 


40 


2 4 6 8 TO 2 4 6 8 TO 2 
Public key (ordered by num. set bits) Public key (ordered by num. set bits) Public key (ordered by num. set bits) 


Fig. 3. (a) The execution time after mitigation using the double scheme technique [10]. 
There are four classes of functional observations after the mitigation. (b) Mitiga- 
tion with bucketing [26]. All observations require to move to the closet red line. 
(c) Functional observations distinguish 7 classes of observations after mitigating with 
bucketing. 
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Overall, SCHMIT achieves the higher entropy measures for all three objectives 
under the performance overhead of 60%. 


3 Preliminaries 


For a finite set Q, we use |Q| for its cardinality. A discrete probability distri- 
bution, or just distribution, over a set Q is a function d : Q->[0,1] such that 
Žac dla) = 1. Let D(Q) denote the set of all discrete distributions over Q. 
We say a distribution d € D(Q) is a point distribution if d(q)=1 for a q E€ Q. 
Similarly, a distribution d € D(Q) is uniform if d(q)=1/|Q| for all q E€ Q. 


Definition 1 (Timing Model). The timing model of a program P is a tuple 
[IP] = (X,Y,S,6) where X = {x1,...,%n} is the set of secret-input variables, 
Y = {y41,..., Ym} is the set of public-input variables, S C R” is a finite set of 
secret-inputs, and ô : R” x R™ — Rso is the execution-time function of the 
program over the secret and public inputs. 


We assume that the adversary knows the program and wishes to learn the 
value of the secret input. To do so, for some fixed secret value s € S, the 
adversary can invoke the program to estimate (to an arbitrary precision) the 
execution time of the program. If the set of public inputs is empty, i.e. m = 0, the 
adversary can only make scalar observations of the execution time corresponding 
to a secret value. In the more general setting, however, the adversary can arrange 
his observations in a functional form by estimating an approximation of the 
timing function 5(s) : R™ — Rso of the program. 

A functional observation of the program P for a secret input s € S is the 
function ô(s) : R™ — Rso defined as y € R™ + 0(s,y). Let F C [R™ — R>o] 
be the finite set of all functional observations of the program P. We define an 
order < over the functional observations F: for f,g € F we say that f < g if 
f(y) < g(y) for all y € R™. 

The set F characterizes an equivalence relation =~, namely secrets with 
equivalent functional observations, over the set S, defined as following: s =F s’ 
if there is an f € F such that 6(s) = 0(s’) = f. Let Sr = (S1,S2,...,S%) be 
the quotient space of S characterized by the observations F = (fi, fo,..., fx). 
We write Sp for the secret set S € SF corresponding to the observations f € F. 
Let B = (B1, Bo,..., By) be the size of observational equivalence class in SF, 
i.e. Bi = |S;,| for fi € F and let B = |S| = YẸ} Bi. 

Shannon entropy, guessing entropy, and min-guess entropy are three preva- 
lent information metrics to quantify information leaks in programs. Kopf and 
Basin [25] characterize expressions for various information-theoretic measures on 
information leaks when there is a uniform distribution on S given below. 


Proposition 1 (Köpf and Basin [25]). Let F = (fi,..., fk) be a set of 
observations and let S be the set of secret values. Let B = (By,..., Bx) be the 
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corresponding size of secret set in each class of observation and B = om Bj. 
Assuming a uniform distribution on S, entropies can be characterized as: 
1. Shannon Entropy: SE(S|F) = (S) X i<i<p Bi loge(B:), 

def T 


2. Guessing Entropy: GE(S|F) = (35) i<i<k B? + 3, and 
3. Min-Guess Entropy: mGE(S|F) = mini<i<x {(Bi + 1)/2}. 


4 Shannon Mitigation Problem 


Our goal is to mitigate the information leakage due to the timing side channels 
by adding synthetic delays to the program. An aggressive, but commonly-used, 
mitigation strategy aims to eliminate the side channels by adding delays such 
that every secret value yields a common functional observation. However, this 
strategy may often be impractical as it may result in unacceptable performance 
degradations of the response time. Assuming a well-known penalty function asso- 
ciated with the performance degradation, we study the problem of maximizing 
entropy while respecting a bound on the performance degradation. We dub the 
decision version of this problem Shannon mitigation. 

Adding synthetic delays to execution-time of the program, so as to mask 
the side-channel, can give rise to new functional observations that correspond 
to upper-envelopes of various combinations of original observations. Let F = 
(fi, f2,---, fk) be the set of functional observations. For I C 1,2,...,k, let 
fr = y € R” & supier fily) be the functional observation corresponding 
to upper-envelope of the functional observations in the set J. Let G(F) = 
{fr : TA9C {1,2,...,k}} be the set of all possible functional observations 
resulting from the upper-envelope calculations. To change the observation of a 
secret value with functional observation f; to a new observation fr (we assume 
that i € I), we need to add delay function fi: y € R™ & fr(y) — fily). 


Mitigation Policies. Let G C G(F) be a set of admissible post-mitigation obser- 
vations. A mitigation policy is a function u : F —> D(G) that for each secret 
s E€ S+ suggests the probability distribution u(f) over the functional observa- 
tions. We say that a mitigation policy is deterministic if for all f € F we have 
that u( f) is a point distribution. Abusing notations, we represent a deterministic 
mitigation policy as a function u : F — G. The semantics of a mitigation pol- 
icy recommends to a program analyst a probability p(f)(g) to elevate a secret 
input s € Sy from the observational class f to the class g € G by adding 
max {0, 9(p) — f(p)} units delay to the corresponding execution-time 5(s,p) for 
all p € Y. We assume that the mitigation policies respect the order, i.e. for 
every mitigation policy yw and for all f € F and g € G, we have that u(f)(g) > 0 
implies that f < g. Let M(¢—g) be the set of mitigation policies from the set of 
observational clusters F into the clusters G. 

For the functional observations F = (f1,...,f,) and a mitigation policy 
u E Mfg), the resulting observation set F[u] C G is defined as: 


Flu] ={g EG: there exists f € F such that u(f)(g) > 0}. 
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Since the mitigation policy is stochastic, we use average sizes of resulting obser- 
vations to represent fitness of a mitigation policy. For F[u] = (g1, go,---, ge), we 
define their expected class sizes B,, = (C1, C2,...,Ce) as C; = pei wf) fi) By 
(observe that oy C; = B). Assuming a uniform distribution on S, various 
entropies for the expected class size after applying a policy y E€ M(¢_4g) can be 
characterized by the following expressions: 


1. Shannon Entropy: SE(S|F, p) = (4) X i<i<e Ci log, (Ci), 

2. Guessing Entropy: GE(S|F, p) = (3) Miciee C? + 4, and 

3. Min-Guess Entropy: mGE(S|F, u) = mini<i<e {(C; + 1)/2}. 

We note that the above definitions do not represent the expected entropies, but 
rather entropies corresponding to the expected cluster sizes. However, the three 
quantities provide bounds on the expected entropies after applying u. Since 
Shannon and Min-Guess entropies are concave functions, from Jensen’s inequal- 
ity, we get that SE(S|F, u) and mGE(S|F, u) are upper bounds on expected 
Shannon and Min-Guess entropies. Similarly, GE(S|F, p), being a convex func- 
tion, give a lower bound on expected guessing entropy. 

We are interested in maximizing the entropy while respecting constraints on 
the overall performance of the system. We formalize the notion of performance 
by introducing performance penalties: there is a function m : F x G — Rso 
such that elevating from the observation f € F to the functional observation 
g € G adds an extra 7( f, g) performance overheads to the program. The expected 
performance penalty associated with a policy u, m(u), is defined as the proba- 
bilistically weighted sum of the penalties, i.e. X- pez geg:fxg Stl UCO) T(S, 9). 
Now, we introduce our key decision problem. 


Definition 2 (Shannon Mitigation). Given a set of functional observations 
F = (fi,.--, fe), a set of admissible post-mitigation observations G C G(F), 
set of secrets S, a penalty function n : F x G — Rso, a performance penalty 
upper bound A € Rso, and an entropy lower-bound E € Rso, the Shannon 
mitigation problem SHAN¢e(F,G,S,7,E, A), for a given entropy measure E E€ 
{SE, GE, mGE}, is to decide whether there exists a mitigation policy u E€ Mfg) 
such that E(S|F, u) > E and m(p) < A. We define the deterministic Shannon 
mitigation variant where the goal is to find a deterministic such policy. 


5 Algorithms for Shannon Mitigation Problem 

5.1 Deterministic Shannon Mitigation 

We first establish the intractability of the deterministic variant. 
Theorem 1. Deterministic Shannon mitigation problem is NP-complete. 


Proof. It is easy to see that the deterministic Shannon mitigation problem is in 
NP: one can guess a certificate as a deterministic mitigation policy u E€ M(f—g) 
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and can verify in polynomial time that it satisfies the entropy and overhead con- 
straints. Next, we sketch the hardness proof for the min-guess entropy measure 
by providing a reduction from the two-way partitioning problem [28]. For the 
Shannon entropy and guess entropy measures, a reduction can be established 
from the Shannon capacity problem [18] and the Euclidean sum-of-squares clus- 
tering problem [8], respectively. 

Given a set A = {a1,a2,...,a,} of integer values, the two-way partitioning 
problem is to decide whether there is a partition A;WA» = A into two sets A; and 
A» with equal sums, i.e. ees a= X aea a. W.l.o.g assume that a; < a; for 
i < j. We reduce this problem to a deterministic Shannon mitigation problem 
SHANmGE( FA, GA, SA, TA, EA, Aa) with k clusters F4 = Ga = (fi, fo, e845 fd 
with the secret set S4 = (S1, S2,..., Sk) such that |S;| = ai. If J i<i<k a 
is odd then the solution to the two-way partitioning instance is trivially no. 
Otherwise, let Ea = (1/2) 30, <;<;, ai. Notice that any deterministic mitigation 
strategy that achieves min-guess entropy larger than or equal to Ea must have 
at most two clusters. On the other hand, the best min-guess entropy value can 
be achieved by having just a single cluster. To avoid this and force getting 
two clusters corresponding to the two partitions of a solution to the two-way 
partitions problem instance A, we introduce performance penalties such that 
merging more than k — 2 clusters is disallowed by keeping performance penalty 
wa(f,g) = 1 and performance overhead Ay = k — 2. It is straightforward to 
verify that an instance of the resulting min-guess entropy problem has a yes 
answer if and only if the two-way partitioning instance does. 


Since the deterministic Shannon mitigation problem is intractable, we design 
an approximate solution for the problem. Note that the problem is hard even if we 
only use existing functional observations for mitigation, i.e., G = F. Therefore, 
we consider this case for the approximate solution. Furthermore, we assume 
the following sequential dominance restriction on a deterministic policy u: for 
f.g € F if f < g then either u(f) < g or u(f) = a(g). In other words, for 
any given f < g, f can not be moved to a higher cluster than g without having 
g be moved to that cluster. For example, Fig. 4(a) shows Shannon mitigation 
problem with four functional observations and all possible mitigation policies (we 
represent u(fi)(f;) with u(i, j)). Figure 4(b) satisfies the sequential dominance 
restriction, while Fig. 4(c) does not. 

The search for the deterministic policies satisfying the sequential dominance 
restriction can be performed efficiently using dynamic programming by effective 
use of intermediate results’ memorizations. 

Algorithm (1) provides a pseudocode for the dynamic programming solution 
to find a deterministic mitigation policy satisfying the sequential dominance. 
The key idea is to start with considering policies that produce a single cluster 
for subclasses P; of the problem with the observation from (f1,..., fi), and 
then compute policies producing one additional cluster in each step by utilizing 
the previously computed sub-problems and keeping track of the performance 
penalties. The algorithm terminates as soon as the solution of the current step 
respects the performance bound. The complexity of the algorithm is O(k°). 


(4,4) Č 
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(4,4) = 1.0 c@ u(4,4)=1.0 c@ 
p(3,3) = 1.0 c (3,3) = 1.0 co 
(2,3) = 10 


(c) u(1,3) = 1.0 u(1,3) = 0.4 @ 
@ aI c: ) 


Fig. 4. (a). Example of Shannon mitigation problem with all possible mitigation poli- 
cies for 4 classes of observations. (b,c) Two examples of the mitigation policies that 
results in 2 and 3 classes of observations. 


5.2 Stochastic Shannon Mitigation Algorithm 


Next, we solve the (stochastic) Shannon mitigation problem by posing it as 
an optimization problem. Consider the stochastic Shannon mitigation problem 
SHANe (F,G = F,SF, n, E, A) with a stochastic policy p : F > D(G) and 


Algorithm 1. APPROXIMATE DETERMINISTIC SHANNON MITIGATION 


Input: The Shannon entropy problem SHANvcr(F,9 = F, SF, T, E, A) 
Output: The entropy table (T). 


1 


O% N OA 


for i = 1 to k do 
Til) = EUU 5;) 
J= 


me 


if Yo z(j,i)(B;/B) < Athen M(i,1)= X z(j,i)(B;/B) 


1<j<i 


|_ else IT(i,1) = 00 

f II(k,1) < œ then return T; 

for r = 2 to k do 

for i = 1 to k do 

Qli,r)={j:1<j<iand H(j,r—1)+ YO a(q,1)(B,/B) < A} 


return T; 


1<j<i 


j<q<i 


if Q40 then T(i,r)= max ' (min (T(j,r—-1), E( Ú S4))) 


jER(Ç,r q=j+1 


else T(i,r)= — 00 
Let j be the index that maximizes T (i,r) 
if 2 #0 then M(i,r) = (H(,r—1)+ X x(a,4)(By/B)) 


j<q<i 


|_ else (i,r) = 00 
| if IM(k,r) <œ then return T; 
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Sr = (S1, S2,..., Sk). The following program characterizes the optimization 
problem that solves the Shannon mitigation problem with stochastic policy. 


Maximize €, subject to: 


O< pi) s lio isin gaer 
Di<j<k Mi) (fj) = 1 for all 1 <i < k. 
Soret Dye Sel HANG) a <A. 
Cj = Ja Silo Ga er 1 <j <k. 


Po NS 


Here, the objective function € is one of the following functions: 


1. Guessing Entropy €gr = 2 CF 
2. Min-Guess Entropy en = = AG | C; > 0} 


3. Shannon Entropy Esg = > C; T 


j=1 


The linear constraints for the problem are defined as the following. The con- 
dition (1) and (2) express that u provides a probability distributions, condition 
(3) provides restrictions regarding the performance constraint, and the condition 
(4) is the entropy specific constraint. The objective function of the optimization 
problem is defined based on the entropy criteria from €. For the simplicity, we 
omit the constant terms from the objective function definitions. For the guessing 
entropy, the problem is an instance of linearly constrained quadratic optimization 
problem [33]. The problem with Shannon entropy is a non-linear optimization 
problem [12]. Finally, the optimization problem with min-guess entropy is an 
instance of mixed integer programming [32]. We evaluate the scalability of these 
solvers empirically in Sect. 6 and leave the exact complexity as an open problem. 
We show that the min-guess entropy objective function can be efficiently solved 
with the branch and bound algorithms [36]. Figure 4(b,c) show two instantiations 
of the mitigation policies that are possible for the stochastic mitigation. 


6 Implementation Details 


A. Environmental Setups. All timing measurements are conducted on an 
Intel NUC5i5RYH. We switch off JIT Compilation and run each experiment 
multiple times and use the mean running time. This helps to reduce the effects 
of environmental factors such as the Garbage Collections. All other analyses are 
conducted on an Intel i5-2.7 GHz machine. 


B. Implementation of Side Channel Discovery. We use the technique pre- 
sented in [45] for the side channel discovery. The technique applies the functional 
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data analysis [38] to create B-spline basis and fit functions to the vector of tim- 
ing observations for each secret value. Then, the technique applies the functional 
data clustering [21] to obtain K classes of observations. We use the number of 
secret values in a cluster as the class size metric and the Lı distance norm 
between the clusters as the penalty function. 


C. Implementation of Mitigation Policy Algorithms. For the stochastic 
optimization, we encode the Shannon entropy and guessing entropy with linear 
constraints in Scipy [22]. Since the objective functions are non-linear (for the 
Shannon entropy) and quadratic (for the guessing entropy), Scipy uses sequential 
least square programming (SLSQP) [34] to maximize the objectives. For the 
stochastic optimization with the min-guess entropy, we encode the problem in 
Gurobi [19] as a mixed-integer programming (MIP) problem [32]. Gurobi solves 
the problem efficiently with branch-and-bound algorithms [1]. We use Java to 
implement the dynamic programming. 


D. Implementation of Enforcement. The enforcement of mitigation pol- 
icy is implemented in two steps. First, we use the initial timing functions and 
characterize them with program internal properties such as basic block calls. To 
do so, we use the decision tree learning approach presented in [45]. The decision 
tree model characterizes each functional observations with properties of program 
internals. Second, given the policy of mitigation, we enforce the mitigation pol- 
icy with a monitoring system implemented on top of the Javassist [15] library. 
The monitoring system uses the decision tree model and matches the properties 
enabled during an execution with the tree model (detection of the current clus- 
ter). Then, it adds extra delays, based on the mitigation policy, to the current 
execution-time and enforces the mitigation policy. Note that the dynamic mon- 
itoring can result in a few micro-second delays. For the programs with timing 
differences in the order of micro-seconds, we transform source code using the 
decision tree model. The transformation requires manual efforts to modify and 
compile the new program. But, it adds negligible delays. 


E. Micro-benchmark Results. Our goal is to compare different mitigation 
methods in terms of their security and performance. We examine the computa- 
tion time of our tool SCHMIT in calculating the mitigation policies. See appendix 
for the relationships between performance bounds and entropy measures. 


Applications: Mod_Exp applications [30] are instances of square-and-multiply 
modular exponentiation (R = y* mod n) used for secret key operations in 
RSA [39]. Branch_and_Loop series consist of 6 applications where each appli- 
cation has conditions over secret values and runs a linear loop over the public 
values. The running time of the applications depend on the slope of the linear 
loops determined by the secret input. 


Computation time comparisons: Fig.5 shows the computation time for 
Branch_and -Loop applications (the applications are ordered in x-axis based 
on the discovered number of observational classes). For the min-guess entropy, 
we observe that both stochastic and dynamic programming approaches are effi- 
cient and fast as shown in Fig. 5(a). For the Shannon and guessing entropies, 
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the dynamic programming is scalable, while the stochastic mitigation is compu- 
tationally expensive beyond 60 classes of observations as shown in Fig. 5(b,c). 


Mitigation Algorithm Comparisons: Table 1 shows micro-benchmark results that 
compare the four mitigation algorithms with the two program series. Double 
scheme mitigation technique [10] does not provide guarantees on the perfor- 
mance overhead, and we can see that it is increased by more than 75 times 
for mod_exp_6. Double scheme method reduces the number of classes of obser- 
vations. However, we observe that this mitigation has difficulty improving the 
min-guess entropy. Second, Bucketing algorithm [26] can guarantee the perfor- 
mance overhead, but it is not an effective method to improve the security of 
functional observations, see the examples mod_exp_6 and Branch_and_Loop_6. 
Third, in the algorithms, SCHMIT guarantees the performance to be below a 
certain bound, while it results in the highest entropy values. In most cases, the 
stochastic optimization technique achieves the highest min-entropy value. Here, 
we show the results with min-guess entropy measure. Also, we have strong evi- 
dences to show that SCHMIT achieves higher Shannon and guessing entropies. 
For example, in B_L_5, the initial Shannon entropy has improved from 2.72 to 
6.62, 4.1, 7.56, and 7.28 for the double scheme, the bucketing, the stochastic, 
and the deterministic algorithms, respectively. 
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Fig. 5. Computation time for synthesizing mitigation policy over Branch_and_Loop 
applications. Computation time for min-guess entropy (a) takes only few seconds. Com- 
putation time for the Shannon entropy (b) and guessing entropy (c) are expensive using 
Stochastic optimization. We set time-out to be 10 hours. 


7 Case Study 
Research Question. Does SCHMIT scale well and improve the security of appli- 
cations (entropy measures) within the given performance bounds? 


Methodology. We use the deterministic and stochastic algorithms for mitigat- 
ing the leaks. We show our results for the min-guess entropy, but other entropy 
measures can be applied as well. Since the task is to mitigate existing leakages, 
we assume that the secret and public inputs are given. 


Objects of Study. We consider four real-world applications: 


In the inset table, we show the basic characteristics of these benchmarks. 
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Application Num Num Num € Initial Initial. 
methods | secret public clusters | Min-guess 

GabFeed 573 1,105 65 6.50 | 34 1.0 

Jetty 63 800 635 0.1 | 20 4.5 

Java Verbal Expressions | 61 2,000 10 0.02| 9 50.5 

Password Checker 6 20 2,620 0.05) 6 1.0 


GabFeed is a chat server with 573 methods [4]. There is a side channel in the 
authentication part of the application where the application takes users’ public 
keys and its own private key, and generating a common key [14]. The vulnerabil- 
ity leaks the number of set bits in the secret key. Initial functional observations 
are shown in Fig. 6a. There are 34 clusters and min-guess entropy is 1. We aim 
to maximize the min-guess entropy under the performance overhead of 50%. 


Jetty. We mitigate the side channels in util.security package of Eclipse Jetty 
web server. The package has Credential class which had a timing side channel. 
This vulnerability was analyzed in [14] and fixed initially in [6]. Then, the devel- 
opers noticed that the implementation in [6] can still leak information and fixed 
this issue with a new implementation in [5]. However, this new implementation 
is still leaking information [45]. We apply SCHMIT to mitigate this timing side 
channels. Initial functional observations is shown in Fig. 6d. There are 20 classes 
of observations and the initial min-guess entropy is 4.5. We aim to maximize the 
min-guess entropy under the performance overhead of 50%. 


Java Verbal Expressions is a library with 61 methods that construct regular 
expressions [2]. There is a timing side channel in the library similar to password 
comparison vulnerability [3] if the library has secret inputs. In this case, start- 
ing from the initial character of a candidate expression, if the character matches 
with the regular expression, it slightly takes more time to respond the request 
than otherwise. This vulnerability can leak all the regular expressions. We con- 
sider regular expressions to have a maximum size of 9. There are 9 classes of 
observations and the initial min-guess entropy is 50.5. We aim to maximize the 
min-guess entropy under the performance overhead of 50%. 


Password Checker. We consider the password matching example from loginBad 
program [9]. The password stored in the server is secret, and the user’s guess is a 
public input. We consider 20 secret (lengths at most 6) and 2,620 public inputs. 
There are 6 different clusters, and the initial min-guess entropy is 1. 


Findings for GabFeed. With the stochastic algorithm, SCHMIT calculates the 
mitigation policy that results in 4 clusters. This policy improves the min-guess 
entropy from 1 to 138.5 and adds an overhead of 42.8%. With deterministic 
algorithm, SCHMIT returns 3 clusters. The performance overhead is 49.7% and 
the min-guess entropy improves from 1 to 106. The user chooses the deterministic 
policy and enforces the mitigation. We apply CART decision tree learning and 
characterizes the classes of observations with GabFeed method calls as shown in 
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Fig. 6. Initial functional observations, decision tree, and the mitigated observations 
from left to right for Gabfeed, Jetty, and Verbal Expressions from top to bottom. 


Fig. 6b. The monitoring system uses the decision tree model and automatically 
detects the current class of observation. Then, it adds extra delays based on 
the mitigation policy to enforce it. The results of the mitigation is shown in 
Fig. 6c. Answer for our research question. Scalability: It takes about 1 second 
to calculate the stochastic and the deterministic policies. Security: Stochastic 
and deterministic variants improve the min-guess entropy more than 100 times 
under the given performance overhead of 50%, respectively. 


Findings for Jetty. The stochastic algorithm and the deterministic algorithm 
find the same policy that results in 1 cluster with 39.6% performance over- 
head. The min-guess entropy improves from 4.5 to 400.5. For the enforcement, 
SCHMIT first uses the initial clusterings and specifies their characteristics with 
program internals that result in the decision tree model shown in Fig. 6e. Since 
the response time is in the order of micro-seconds, we transform the source code 
using the decision tree model by adding extra counter variables. The results of 
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the mitigation is shown in Fig. 6f. Scalability: It takes less than 1 second to cal- 
culate the policies for both algorithms. Security: Stochastic and deterministic 
variants improve the min-guess entropy 89 times under the given performance 
overhead. 


Findings for Java Verbal Expressions. For the stochastic algorithm, the 
policy results in 2 clusters, and the min-guess entropy has improved to 500.5. The 
performance overhead is 36%. For the dynamic programming, the policy results 
in 2 clusters. This adds 28% of performance overhead, while it improves the 
min-guess entropy from 50.5 to 450.5. The user chooses to use the deterministic 
policy for the mitigation. For the mitigation, we transform the source code using 
the decision tree model and add the extra delays based on the mitigation policy. 


Findings for Password Matching. Both the deterministic and the stochastic 
algorithms result in finding a policy with 2 clusters where the min-guess entropy 
has improved from 1 to 5.5 with the performance overhead of 19.6%. For the 
mitigation, we transform the source code using the decision tree model and add 
extra delays based on the mitigation policy if necessary. 


8 Related Work 


Quantitative theory of information have been widely used to measure how much 
information is being leaked with side-channel observations [11,20,25,41]. Miti- 
gation techniques increase the remaining entropy of secret sets leaked through 
the side channels, while considering the performance [10,23,26, 40,48,49]. 

K6pf and Diirmuth [26] use a bucketing algorithm to partition programs’ 
observations into intervals. With the unknown-message threat model, Kopf and 
Diirmuth [26] propose a dynamic programming algorithm to find the optimal 
number of possible observations under a performance penalty. The works [10, 48] 
introduce different black-box schemes to mitigate leaks. In particular, Askarov 
et al. [10] show the quantizing time techniques, which permit events to release at 
scheduled constant slots, have the worst case leakage if the slot is not filled with 
events. Instead, they introduce the double scheme method that has a schedule of 
predictions like the quantizing approach, but if the event source fails to deliver 
events at the predicted time, the failure results in generating a new schedule in 
which the interval between predictions is doubled. We compare our mitigation 
technique with both algorithms throughout this paper. 

Elimination of timing side channels is a common technique to guarantee the 
confidentiality of software [7,17,27,30,31,46]. The work [46] aims to eliminate 
side channels using static analysis enhanced with various techniques to keep the 
performance overheads low without guaranteeing the amounts of overhead. In 
contrast, we use dynamic analysis and allow a small amount of information to 
leak, but we guarantee an upper-bound on the performance overhead. 

Machine learning techniques have been used for explaining timing differences 
between traces [42-44]. Tizpaz-Niari et al. [44] consider performance issues in 
softwares. They also cluster execution times of programs and then explain what 
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program properties distinguish the different functional clusters. We adopt their 
techniques for our security problem. 
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Abstract. We address the problem of verifying k-safety properties: properties 
that refer to k interacting executions of a program. A prominent way to verify 
k-safety properties is by self composition. In this approach, the problem of check- 
ing k-safety over the original program is reduced to checking an “ordinary” safety 
property over a program that executes k copies of the original program in some 
order. The way in which the copies are composed determines how complicated it 
is to verify the composed program. We view this composition as provided by a 
semantic self composition function that maps each state of the composed program 
to the copies that make a move. Since the “quality” of a self composition func- 
tion is measured by the ability to verify the safety of the composed program, we 
formulate the problem of inferring a self composition function together with the 
inductive invariant needed to verify safety of the composed program, where both 
are restricted to a given language. We develop a property-directed inference algo- 
rithm that, given a set of predicates, infers composition-invariant pairs expressed 
by Boolean combinations of the given predicates, or determines that no such pair 
exists. We implemented our algorithm and demonstrate that it is able to find self 
compositions that are beyond reach of existing tools. 


1 Introduction 


Many relational properties, such as noninterference [12], determinism [21], service 
level agreements [9], and more, can be reduced to the problem of k-safety. Namely, 
reasoning about k different traces of a program simultaneously. A common approach 
to verifying k-safety properties is by means of self composition, where the program 
is composed with k copies of itself [4,32]. A state of the composed program consists 
of the states of each copy, and a trace naturally corresponds to k traces of the original 
program. Therefore, k-safety properties of the original program become ordinary safety 
properties of the composition, hence reducing k-safety verification to ordinary safety. 
This enables reasoning about k-safety properties using any of the existing techniques 
for safety verification such as Hoare logic [20] or model checking [7]. 

While self composition is sound and complete for k-safety, its applicability is ques- 
tionable for two main reasons: (i) considering several copies of the program greatly 
increases the state space; and (ii) the way in which the different copies are com- 
posed when reducing the problem to safety verification affects the complexity of 
the resulting self composed program, and as such affects the complexity of verify- 
ing it. Improving the applicability of self composition has been the topic of many 
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works [2, 14, 18,26, 30,33]. However, most efforts are focused on compositions that are 
pre-defined, or only depend on syntactic similarities. 

In this paper, we take a different approach; we build upon the observation that by 
choosing the “right” composition, the verification can be greatly simplified by leverag- 
ing “simple” correlations between the executions. To that end, we propose an algorithm, 
called PDSC, for inferring a property directed self composition. Our approach uses a 
dynamic composition, where the composition of the different copies can change during 
verification, directed at simplifying the verification of the composed program. 

Compositions considered in previous work differ in the order in which the copies 
of the program execute: either synchronously, asynchronously, or in some mix of the 
two [3, 14,34]. To allow general compositions, we define a composition function that 
maps every state of the composed program to the set of copies that are scheduled in 
the next step. This determines the order of execution for the different copies, and thus 
induces the self composed program. Unlike most previous works where the composition 
is pre-defined based on syntactic rules only, our composition is semantic as it is defined 
over the state of the composed program. 

To capture the difficulty of verifying the composed program, we consider verifi- 
cation by means of inferring an inductive invariant, parameterized by a language for 
expressing the inductive invariant. Intuitively, the more expressive the language needs 
to be, the more difficult the verification task is. We then define the problem of inferring 
a composition function together with an inductive invariant for verifying the safety of 
the composed program, where both are restricted to a given language. Note that for a 
fixed language £, an inductive invariant may exist for some composition function but 
not for another!. Thus, the restriction to £ defines a target for the inference algorithm, 
which is now directed at finding a composition that admits an inductive invariant in £. 


Example 1. To demonstrate our approach, consider the program in Fig. 1. The program 
inserts a new value into an array. We assume that the array A and its length len are 
“low”-security variables, while the inserted value A is “high’-security. The first loop 
finds the location in which h will be inserted. Note that the number of iterations depends 
on the value of h. Due to that, the second loop executes to ensure that the output 7 (which 
corresponds to the number of iterations) does not leak sensitive data. As an example, we 
emphasize that without the second loop, i could leak the location of h in A. To express 
the property that 2 does not leak sensitive data, we use the 2-safety property that in any 
two executions, if the inputs A and len are the same, so is the output 7. 

To verify the 2-safety property, consider two copies of the program. Let the language 
L for verifying the self composition be defined by the predicates depicted in Fig. 1. The 
most natural self composition to consider is a lock-step composition, where the copies 
execute synchronously. However, for such a composition the composed program may 
reach a state where, for example, 7; = 72 + 1. This occurs when the first copy exists the 
first loop, while the second copy is still executing it. Since the language cannot express 
this correlation between the two copies, no inductive invariant suffices to verify that 
41 = 22 when the program terminates. 


" See the extended version [29] for an example that requires a non-linear inductive invariant with 
a composition that is based on the control structure but has a linear invariant with another. 
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int arrayInsert (int[] A, int len, int h) { 
ant 2-0; composition: 
1: whil i<l Ail <h g j 
a ae g Sape ee ) if (pcı <3 && (pc2 >0 || !condı) 
i eee i EN 
2: len = shift_array(A, i, 1); &k o | (peg == 0 && condz))) 
Ali] = h; z . 
; 5 4 else if (pc2 <3 && (pc; >0 || !cond2ə) 
3: crn (i < len) && (pey ==3 || (pe, ==0 && cond;))) 
: step (2) ; 


4: return i; 


} else step(1,2); 


cond, := t1 <len, && Afii] < hi 


predicates: i; = i2, i1 < leni, i2 < lenz, conds i= dy < lena ki Azha] = ka 


Aj[t1] < hi, Agli2] < ho, lenı = lenz, 
len, lena +1, leng len, +1 


Fig. 1. Constant-time insert to an array. 


In contrast, when verifying the 2-safety property, PDSC directs its search towards a 
composition function for which an inductive invariant in £ does exist. As such, it infers 
the composition function depicted in Fig. 1, as well as an inductive invariant in £. The 
invariant for this composition implies that 2; = tz at every state. 


As demonstrated by the example, PDSC focuses on logical languages based on pred- 
icate abstraction [17], where inductive invariants can be inferred by model checking. In 
order to infer a composition function that admits an inductive invariant in £, PDSC starts 
from a default composition function, and modifies its definition based on the reasoning 
performed by the model checker during verification. As the composition function is 
part of the verified model (recall that it is defined over the program state), different 
compositions are part of the state space explored by the model checker. As a result, a 
key ingredient of PDSC is identifying “bad” compositions that prevent it from finding 
an inductive invariant in £. It is important to note that a naive algorithm that tries all 
possible composition functions has a time complexity o(22""' ), where P is the set of 
predicates considered. However, integrating the search for a composition function into 
the model checking algorithm allows us to reduce the time complexity of the algorithm 
to 201P), where we show that the problem is in fact PSPACE-hard.? 

We implemented PDSC using SEAHORN [19], Z3 [25] and SPACER [22] and evalu- 
ated it on examples that demonstrate the need for nontrivial semantic compositions. Our 
results clearly show that PDSC can solve complex examples by inferring the required 
composition, while other tools cannot verify these examples. We emphasize that for 
these particular examples, lock-step composition is not sufficient. We also evaluated 
PDSC on the examples from [26,30] that are proven with the trivial lock-step composi- 
tion. On these examples, PDSC is comparable to state of the art tools. 


Related Work. This paper addresses the problem of verifying k-safety properties (also 
called hyperproperties [8]) by means of self composition. Other approaches tackle the 
problem without self-composition, and often focus on more specific properties, most 
noticeably the 2-safety noninterference property (e.g. [1,33]). Below we focus on works 
that use self-composition. 


? Proofs of the claims made in this paper can be found in the extended version [29]. 
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Previous work such as [2—4, 14, 15,32] considered self composition (also called 
product programs) where the composition function is constant and set a-priori, using 
syntax-based hints. While useful in general, such self compositions may sometimes 
result in programs that are too complex to verify. This is in contrast to our approach, 
where the composition function is evolving during verification, and is adapted to the 
capabilities of the model checker. 

The work most closely related to ours is [30] which introduces Cartesian Hoare 
Logic (CHL) for verification of k-safety properties, and designs a verification frame- 
work for this logic. This work is further improved in [26]. These works search for a 
proof in CHL, and in doing so, implicitly modify the composition. Our work infers the 
composition explicitly and can use off-the-shelf model checking tools. More impor- 
tantly, when loops are involved both [30] and [26] use lock-step composition and align 
loops syntactically. Our algorithm, in contrast, does not rely on syntactic similarities, 
and can handle loops that cannot be aligned trivially. 

There have been several results in the context of harnessing Constraint Horn Clauses 
(CHC) solvers for verification of relational properties [11,24]. Given several copies of 
a CHC system, a product CHC system that synchronizes the different copies is created 
by a syntactical analysis of the rules in the CHC system. These works restrict the syn- 
chronization points to CHC predicates (i.e., program locations), and consider only one 
synchronization (obtained via transformations of the system of CHCs). On the other 
hand, our algorithm iteratively searches for a good synchronization (composition), and 
considers synchronizations that depend on program state. 


Equivalence Checking and Regression Verification. Equivalence checking is another 
closely related research field, where a composition of several programs is considered. 
As an example, equivalence checking is applied to verify the correctness of compiler 
optimizations [10, 18,28,34]. In [28] the composition is determined by a brute-force 
search for possible synchronization points. While this brute-force search resembles our 
approach for finding the correct composition, it is not guided by the verification process. 
The works in [10,18] identify possible synchronization points syntactically, and try to 
match them during the construction of a simulation relation between programs. 
Regression verification also requires the ability to show equivalence between dif- 
ferent versions of a program [15,16,31]. The problem of synchronizing unbalanced 
loops appears in [31] in the form of unbalanced recursive function calls. To allow syn- 
chronization in such cases, the user can specify different unrolling parameters for the 
different copies. In contrast, our approach relies only on user supplied predicates that 
are needed to establish correctness, while synchronization is handled automatically. 


2 Preliminaries 


In this paper we reason about programs by means of the transition systems defining 
their semantics. A transition system is a tuple T = (S, R, F), where S is a set of states, 
R C Sx & isa transition relation that specifies the steps in an execution of the program, 
and F C S isa set of terminal states F C S such that every terminal state s € F has 
an outgoing transition to itself and no additional transitions (terminal states allow us to 
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reason about pre/post specifications of programs). An execution or trace T = Sọ, 81,... 
is a (finite or infinite) sequence of states such that for every i > 0, (si, si+1) € R. The 
execution is terminating if there exists 0 < i < |r| such that s; € F. In this case, the 
suffix of the execution is of the form s;, 5;,... and we say that 7 ends at s;. 

As usual, we represent transition systems using logical formulas over a set of vari- 
ables, corresponding to the program variables. We denote the set of variables by V. The 
set of terminal states is represented by a formula over V and the transition relation is 
represented by a formula over V W V’, where V represents the pre-state of a transition 
and V’ = {v’ | v € V} represents its post-state. In the sequel, we use sets of states and 
their symbolic representation via formulas interchangeably. 


Safety and Inductive Invariants. We consider safety properties defined via pre/post 
conditions.* A safety property is a pair (pre, post) where pre, post are formulas over V, 
representing subsets of S, denoting the pre- and post-condition, respectively. T satisfies 
(pre, post), denoted T | (pre, post), if every terminating execution m of T that starts 
in a state sq such that sg — pre ends in a state s such that s = post. In other words, for 
every state s that is reachable in T from a state in pre we have that s = F — post. 

A prominent way to verify safety properties is by finding an inductive invariant. 
An inductive invariant for a transition system T and a safety property (pre, post) is a 
formula Inv such that(1) pre => Inv (initiation), (2) Inv A R => Inv’ (consecution), 
and (3) Inu = (F — post) (safety), where p = w denotes the validity of y > w, 
and y’ denotes y(V’), i.e., the formula obtained after substituting every v € V by the 
corresponding v’ € V. If there exists such an inductive invariant, then T |= (pre, post). 


k-safety. A k-safety property refers to k interacting executions of T. Similarly to an 
ordinary property, it is defined by (pre, post), except that pre and post are defined over 
VIW... YVE where Vt = {vt | v € V} denotes the ith copy of the program variables. 
As such, pre and post represent sets of k-tuples of program states (k-states for short): 
for a k-tuple (s,,..., 8%) of states and a formula y over V! w ... W V*, we say that 
(s1,.-.-,8~) = y if ọ is satisfied when for each i, the assignment of V’ is determined 
by si. We say that T satisfies (pre, post), denoted T |} (pre, post), if for every k 
terminating executions 71,...,7* of T that start in states s,,..., Sk, respectively, such 
that (s1,...,8%) FE pre, it holds that they end in states t1,...,t,, respectively, such 
that (t1,...,t%) = post. 

For example, the non interference property may be specified by the following 2- 
safety property: pre = \ crown V! = V’, post = NucLowout V = V? where LowIn 
and LowOut denote subsets of the program inputs, resp. outputs, that are considered 
“low security” and the rest are classified as “high security”. This property asserts that 
every 2 terminating executions that start in states that agree on the “low security” inputs 
end in states that agree on the low security outputs, i.e., the outcome does not depend 
on any “high security” input and, hence, does not leak secure information. 

Checking k-safety properties reduces to checking ordinary safety properties by cre- 
ating a self composed program that consists of k copies of the transition system, each 


> Our results can be extended to arbitrary safety (and k-safety) properties by introducing 
“observable” states to which the property may refer. 
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with its own copy of the variables, that run in parallel in some way. Thus, the self com- 
posed program is defined over variables V!!* = Y1w...wV*, where V! = {v | v € V} 
denotes the variables associated with the ¿th copy. For example, a common compo- 
sition is a lock-step composition in which the copies execute simultaneously. The 
resulting composed transition system T'l*¥ = (SIF RIK F Ilk) is defined such that 
Slk = 9x... x 9, FIE = A‘, F(Y’) and RI* = AS, RI, VI’). Note that 
RIIK is defined over VIIK w pilk’ (as usual). Then, the k-safety property (pre, post) is 
satisfied by T if and only if an ordinary safety property (pre, post) is satisfied by T Ik, 
More general notions of self composition are investigated in Sect. 3. 


3 Inferring Self Compositions for Restricted Languages of 
Inductive Invariants 


Any self-composition is sufficient for reducing k-safety to safety, e.g., lock- 
step, sequential, synchronous, asynchronous, etc. However, the choice of the self- 
composition used determines the difficulty of the resulting safety problem. Different 
self composed programs would require different inductive invariants, some of which 
cannot be expressed in a given logical language. 

In this section, we formulate the problem of inferring a self composition function 
such that the obtained self composed program may be verified with a given language of 
inductive invariants. We are, therefore, interested in inferring both the self composition 
function and the inductive invariant for verifying the resulting self composed program. 
We start by formulating the kind of self compositions that we consider. 

In the sequel, we fix a transition system T = (S,R,F) with a set of 
variables V. 


3.1 Semantic Self Composition 


Roughly speaking, a k self composition of T consists of k copies of T that execute 
together in some order, where steps may interleave or be performed simultaneously. 
The order is determined by a self composition function, which may also be viewed as 
a scheduler that is responsible for scheduling a subset of the copies in each step. We 
consider semantic compositions in which the order may depend on the states of the 
different copies, as well as the correlations between them (as opposed to syntactic com- 
positions that only depend on the control locations of the copies, but may not depend 
on the values of other variables): 


Definition 1 (Semantic Self Composition Function). A semantic k self composition 
function (k-composition function for short) is a function f : SE — P({1..k}), mapping 
each k-state to a nonempty set of copies that are to participate in the next step of the 
self composed program’. 


4 We consider memoryless composition functions. Compositions that depend on the history of 
the (joint) execution are supported via ghost state added to the program to track the history. 
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We represent a k-composition function f by a set of logical conditions, with a 
condition Cm for every nonempty subset W C {1..k} of the copies. For each such 
M C {1..k}, the condition Cm is defined over ylk = V! w... W VE, and hence it 
represents a set of k-states, with the meaning that all the k-states that satisfy C'y are 
mapped to M by f: 


f(s1,---,S5k) =M ifand only if (s1,..., sk) = Cm. 


To ensure that the function is well defined, we require that (\/ mC m) = true, which 
ensures that every k-state satisfies at least one of the conditions. We also require that 
for every Mı # Mo, Cm, \ Cm, = false, hence every k-state satisfies at most one 
condition. Together these requirements ensure that the conditions induce a partition of 
the set of all k-states. In the sequel, we identify a k-composition function f with its 
symbolic representation via conditions {Cm }m and use them interchangeably. 


Definition 2 (Composed Program). Given a k-composition function f, represented 
via conditions Cm for every nonempty set M C {1..k}, we define the k self composition 
of T to be the transition system Tf = (S/F, Rf , F'l*) over variables VE = Vt w... w 
VE defined as follows: F| = es F", where F? = F(V'), and 


Ri = V (Cu Agm) where pm = VAN RVI, VI’) A A yi = yi! 
OAMC{1..k} pee a 


Thus, in T/, the set of states consists of k-states (silk = § x... X S), the ter- 
minal states are k-states in which all the individual states are terminal, and the tran- 
sition relation includes a transition from (s1,...,5%) to (s1,...,s%) if and only if 
f(si,--.,5n) = M and (Vi € M. (si,s1) € R)A (Vi € M. si = 8/). That is, 
every transition of Tf corresponds to a simultaneous transition of a subset M of the 
k copies of T, where the subset is determined by the self composition function f. If 
f(81,---,8%) = M, then for every i € M we say that i is scheduled in (s1,..., Sk). 


Example 2. A k self composition that runs the k copies of T sequentially, one after the 


other, corresponds to a k-composition function f defined by f(s1,..., S) = {i} where 
i € {1..k} is the minimal index of a non-terminal state in {51,..., Sk}. If all states in 
{s1,...,8,} are terminal then i = k (or any other index). This is encoded as follows: 


for every 1 <i < k, Cy, =F A A 


Fi, Cr} = Nier t” and Cm = false for 
every other M C {1..k}. 


j<i 


Example 3. The lock-step composition that runs the k copies of T synchronously cor- 
responds to a k-self composition function f defined by f(s1,..., sk) = {1,...,k}, 
and encoded by C{1,... x} = true and Cm = false for every other M C {1..k}. 


In order to ensure soundness of a reduction of k-safety to safety via self composi- 
tion, one has to require that the self composition function does not “starve” any copy 
of the transition system that is about to terminate if it continues to execute. We refer to 
this requirement as fairness. 


168 R. Shemer et al. 


Definition 3 (Fairness). A k-self composition function f is fair if for every k terminat- 
ing executions T!,..., n" of T there exists an execution zll of Tf such that for every 
copy i € {1..k}, the projection of m'\ to i is 7°. 

Note that by the definition of the terminal states of Tf, rl as above is guaranteed 
to be terminating. We say that the ith copy terminates in 7! if mll contains a k-state 
(S1,.--,8%) such that s; € F. Fairness may be enforced in a straightforward way by 
requiring that whenever f(s51,...,5%) = M, the set M includes no index i for which 
si E€ F, unless all have terminated. Since we assume that terminal states may only 
transition to themselves, a weaker requirement that suffices to ensure fairness is that M 
includes at least one index į for which s; ¢ F’, unless there is no such index. 

The following claim is now straightforward: 


Lemma 1. Let T be a transition system, (pre, post) a k-safety property, and f a fair 
k-composition function for T and (pre, post). Then 


T H" (pre, post) iff TË $ (pre, post). 


Proof (sketch). Every terminating execution of T corresponds to k terminating execu- 
tions of T. Fairness of f ensures that the converse also holds. 


To demonstrate the necessity of the fairness requirement, consider a (non-fair) self 
composition function f that maps every state to {1}. Then, regardless of what the actual 
transition system T does, the resulting self composition T satisfies every pre-post 
specification vacuously, as it never reaches a terminal state. 


Remark 1. While we require the conditions {Cm }m defining a self composition func- 
tion f to induce a partition of S'!* in order to ensure that f is well defined as a (total) 
function, the requirement may be relaxed in two ways. First, we may allow Cwm, and 
Cm, to overlap. This will add more transitions and may make the task of verifying 
the composed program more difficult, but it maintains the soundness of the reduction. 
Second, it suffices that the conditions cover the set of reachable states of the composed 
program rather than the entire state space. These relaxations do not damage sound- 
ness. Technically, this means that f represented by the conditions is a relation rather 
than a function. We still refer to it as a function and write f(s1,...,5,) = M to indi- 
cate that (s1,..., Sk) = Cm, not excluding the possibility that (s1,...,5,) = M’ 
for M’ Æ M as well. We note that as long as the language used to describe com- 
positions is closed under Boolean operations, we can always extract from the con- 
ditions {Cm }m a function f’. This is done as follows: First, to prevent the overlap 
between conditions, determine an arbitrary total order < on the sets M C {1..k} and 
set Cy := Cu A An <m “CN. Second, to ensure that the conditions cover the entire 
state space, set Ch ky = Cue} V 7(V m Cm). It is easy to verify that f’ defined by 
{Cr }m is a total self composition function and that if f is fair, then so is f’. 


3.2 The Problem of Inferring Self Composition with Inductive Invariant 


Lemma | states the soundness of the reduction of k-safety to ordinary safety. Together 
with the ability to verify safety by means of an inductive invariant, this leads to a verifi- 
cation procedure. However, while soundness of the reduction holds for any self compo- 
sition, an inductive invariant in a given language may exist for the composed program 
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resulting from some compositions but not from others. We therefore consider the self 
composition function and the inductive invariant together, as a pair, leading to the fol- 
lowing definition. 


Definition 4. Let T be a transition system and (pre, post) a k safety property. For a 
formula Inv over VIF and a self composition function f represented by conditions 
{Cu}m, we say that (f, Inv) is a composition-invariant pair for T and (pre, post) if 
the following conditions hold: 


— pre => Inv (initiation of Inv), 

— for every O # M C {1..k}, Inv\ Cu Agm => Inv’ (consecution of Inv for 
R’), 

- Iw => (CMa F!) — post) (safety of Inv), 

- Iw => VyuCm (f covers the reachable states), 

— for every) # M C {1..k}, Cy A (VE 3FI) => Vie AF) (f is fair). 


j=l 

As commented in Remark 1, we relax the requirement that (\/,, Cm) = true to 
Iw = V m Cm, thus ensuring that the conditions cover all the reachable states. 
Since the reachable states of T are determined by {Cm }m (which define f), this 
reveals the interplay between the self composition function and the inductive invariant. 
Furthermore, we do not require that Cm, A Cm, = false for Mı #4 Mo, hence a 
k-state may satisfy multiple conditions. As explained earlier, these relaxations do not 
damage soundness. Furthermore, if we construct from f a self composition function f’ 
as described in Remark 1, Inv would be an inductive invariant for Tf "as well. 


Lemma 2. Jf there exists a composition-invariant pair ( f, Inv) for T and (pre, post), 
then T |=" (pre, post). 


If we do not restrict the language in which f and Inv are specified, then the converse 
also holds. However, in the sequel we are interested in the ability to verify k-safety with 
a given language, e.g., one for which the conditions of Definition 4 belong to a decidable 
fragment of logic and hence can be discharged automatically. 


Definition 5 (Inference in £). Let £ be a logical language. The problem of inferring a 
composition-invariant pair in L is defined as follows. The input is a transition system T 
and a k-safety property (pre, post). The output is a composition-invariant pair ( f, Inv) 
for T and (pre, post) (as defined in Definition 4), where Inv € L and f is represented 
by conditions {Cm }m such that Cm € £ for every) 4 M C {1..k}. Ifno such pair 
exists, the output is “no solution”. 


When no solution exists, it does not necessarily mean that T |A* (pre, post). Instead, it 
may be that the language £ is simply not expressive enough. Unfortunately, for expres- 
sive languages (e.g., quantified formulas or even quantifier free linear integer arith- 
metic), the problem of inferring an inductive invariant alone is already undecidable, 
making the problem of inferring a composition-invariant pair undecidable as well: 


Lemma 3. Let L be closed under Boolean operations and under substitution of a vari- 
able with a value, and include equalities of the form v = a, where v is a variable and 
a is a value (of the same sort). If the problem of inferring an inductive invariant in £ is 
undecidable, then so is the problem of inferring a composition-invariant pair in L. 
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For example, linear integer arithmetic satisfies the conditions of the lemma. This 
motivates us to restrict the languages of inductive invariants. Specifically, we con- 
sider languages defined by a finite set of predicates. We consider relational predicates, 
defined over VIF = V! w...w V*. For a finite set of predicates P, we define Lp to be 
the set of all formulas obtained by Boolean combinations of the predicates in P. 


Definition 6 (Inference using predicate abstraction). The problem of inferring a 
predicate-based composition-invariant pair is defined as follows. The input is a tran- 
sition system T, a k-safety property (pre, post), and a finite set of predicates P. The 
output is the solution to the problem of inferring a composition-invariant pair for T 
and (pre, post) in Lp. 


Remark 2. It is possible to decouple the language used for expressing the self com- 
position function from the language used to express the inductive invariant. Clearly, 
different sets of predicates (and hence languages) can be assigned to the self compo- 
sition function and to the inductive invariant. However, since inductiveness is defined 
with respect to the transitions of the composed system, which are in turn defined by the 
self composition function, if the language defining f is not included in the language 
defining Inv, the conditions Cy, themselves would be over-approximated when check- 
ing the requirements of Definition 4 and therefore would incur a precision loss. For this 
reason, we use the same language for both. 


Since the problem of invariant inference in Lp is PSPACE-hard [23], a reduc- 
tion from the problem of inferring inductive invariants to the problem of inferring 
composition-invariant pairs (similar to the one used in the proof of Lemma 3) shows 
that composition-invariant inference in Lp is also PSPACE-hard: 


Theorem 1. Inferring a predicate-based composition-invariant pair is PSPACE-hard. 


4 Algorithm for Inferring Composition-Invariant Pairs 


In this section, we present Property Directed Self-Composition, PDSC for short—our 
algorithm for tackling the composition-invariant inference problem for languages of 
predicates (Definition 6). Namely, given a transition system T, a k-safety property 
(pre, post) and a finite set of predicates P, we address the problem of finding a pair 
(f, Inv), where f is a self composition function and Inv is an inductive invariant for 
the composed transition system Tf obtained from f, and both of them are in Lp, i.e., 
defined by Boolean combinations of the predicates in P. 

We rely on the property that a transition system (in our case TÔ) has an inductive 
invariant in Lp if and only if its abstraction obtained using P is safe. This is because, 
the set of reachable abstract states is the strongest set expressible in Lp that satisfies ini- 
tiation and consecution. Given T, this allows us to use predicate abstraction to either 
obtain an inductive invariant in Lp for Tf (if the abstraction of T/ is safe) or determine 
that no such inductive invariant exists (if an abstract counterexample trace is obtained). 
The latter indicates that a different self composition function needs to be considered. 
A naive realization of this idea gives rise to an iterative algorithm that starts from an 
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1 f —lockstep , E — Q, Unreach — false 

2 while (true) do 

3 (res, Inv, cer) — Abs_Reach(P, T’, pre, post, Unreach) 
4 if res = safe then return (f, Inv(P)) 
5 (8, M) — Last_Step(cez) 
6 E< EU{(é,M)} 

7 while (Al1_Excluded_Or_Starving(§, F)) do 

8 Unreach — Unreach V & 

9 if Unreach ^ pre(B) F false then return “no solution in Lp” 


10 cer — Remove_Last-_Step(cez) 
1 (8, M) — Last_Step(cez) 
12 E < EUu{(8,M)} 


3 | f< Modify-sc(f, 8, E) 
Algorithm 1. PDSC: Property-Directed Self-Composition. 


arbitrary initial composition function and in each iteration computes a new composition 
function. At the worst case such an algorithm enumerates all self composition functions 
defined in £p, i.e., has time complexity o(22”' ). Importantly, we observe that, when 
no inductive invariant exists for some composition function, we can use the abstract 
counterexample trace returned in this case to (i) generalize and eliminate multiple com- 
position functions, and (ii) identify that some abstract states must be unreachable if 
there is to be a composition-invariant pair, i.e., we “block” states in the spirit of prop- 
erty directed reachability [5,13]. This leads to the algorithm depicted in Algorithm 1 
whose worst case time complexity is 2°? (P1), Next, we explain the algorithm in detail. 


Finding an Inductive Invariant for a Given Composition Function Using Predicate 
Abstraction. We use predicate abstraction [17,27] to check if a given candidate com- 
position function has a corresponding inductive invariant. This is done as follows. The 
abstraction of T/ using P, denoted Ap(T*), is a transition system (, Ê) defined over 
variables B, where B = {bp | p € P} (we omit the terminal states). Ô = {0, 1}5, i.e., 
each abstract state corresponds to a valuation of the Boolean variables representing P. 
An abstract state ê € Î represents the following set of states of Tf: 


7(8) ={sl e sl* | Yp € P. sl Epo ê(bp) = 1} 


We extend y to sets of states and to formulas representing sets of states in the usual 
way. The abstract transition relation is defined as usual: 


R = {(81, 82) | sll, E€ yh) Jsll, E€ ¥(82). (s!!,, sll.) E Rf} 


Note that the set of abstract states in Ap (TY) does not depend on f. 


Notation. We sometimes refer to an abstract state § € S' as the formula A #(bp)=1 bp A 
p)= 


Ns@,)=0 bp. For a formula Y € Lp, we denote by Y(B) the result of substituting each 
p € P in y by the corresponding Boolean variable b,,. For the opposite direction, given 
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a formula over B, we denote by (P) the formula in £p resulting from substituting 
each bp € B in y by p. Therefore, ~(P) is a symbolic representation of y(w). 


Every set defined by a formula Y € Lp is precisely represented by w(B) in the sense 
that ~(~(B)) is equal to the set of states defined by y, i.e., ¢)(B) is a precise abstraction 
of ~. For simplicity, we assume that the termination conditions as well as the pre/post 
specification can be expressed precisely using the abstraction, in the following sense: 


Definition 7. P is adequate for T and (pre, post) if there exist Ppre, Ppost, Pri E LP 
such that pre = pre, post = post and pri = F" (for every copy i € {1..k}). 


The following lemma provides the foundation for our algorithm: 


Lemma 4. Let T be a transition system, (pre, post) a k safety property, and P a finite 
set of predicates adequate for T and (pre, post). For a self composition function f 
defined via conditions {Cm }m in Lp, there exists an inductive invariant Inv in Lp 
such that ( f, Inv) is a composition-invariant pair for T and (pre, post) if and only if 
the following three conditions hold: 


S1 All reachable states of Ap(T‘) from Qpre(B) satisfy on pri(B)) > Pposr(B), 
S2 All reachable states of Ap(TS) from Qpre(B) satisfy V my Cu(B), and 


S3 For every? 4 M C {1..k}, Cu (B) A Wi 79 R5(B)) = Vieu Yr (B). 


Furthermore, if the conditions hold, then the symbolic representation of the set of 
abstract states of Ap(T) reachable from Qpre(B) is a formula Inv over B such that 
(f, Inv(P)) is a composition-invariant pair for T and (pre, post). 


Algorithm 1 starts from the lock-step self composition function (Line 1), which 
is fair’, and constructs the next candidate f such that condition S3 in Lemma 4 
always holds (see discussion of Modify_SC). Thus, condition S3 need not be checked 
explicitly. 

Algorithm 1 checks whether conditions S1 and S2 hold for a given candidate 
composition function f by calling Abs_Reach (Line.3) — both checks are per- 
formed via a (non-)reachability check in Ap(T/), checking whether a state violating 
ey yri(B)) > Ppos(B) or Vm Cm(B) is reachable from ¢,,-(B). Algorithm 1 
maintains the abstract states that are not in V ,, Cm (B) by the formula Unreach defined 
over B, which is initialized to false (as the lock-step composition function is defined for 
every state) and is updated in each iteration of Algorithm 1 to include the abstract states 
violating V ,, Cm (B). If no abstract state violating S1 or S2 is reachable, i.e., the con- 
ditions hold, then Abs_Reach returns the (potentially overapproximated) set of reach- 
able abstract states, represented by a formula Inv over B. In this case, by Lemma 4, 
(f, Inv(P)) is a composition-invariant pair (line 4). Otherwise, an abstract counterex- 
ample trace is obtained. (We can of course apply bounded model checking to check if 
the counterexample is real; we omit this check as our focus is on the case where the 
system is safe.) 


Remark 3. In practice, we do not construct Ap(T’) explicitly. Instead, we use the 
implicit predicate abstraction approach [6]. 


> Any fair self composition can be chosen as the initial one; we chose lock-step since it is a good 
starting point in many applications. 
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Eliminating Self Composition Candidates Based on Abstract Counterexamples. 
An abstract counterexample to conditions S1 or S2 indicates that the candidate com- 
position function f has no corresponding Inv. Violation of S1 can only be resolved by 
changing f such that the abstract trace is no longer feasible. Violation of S2 may, in 
principle, also be resolved by extending the definition of f such that it is defined for all 
the abstract states in the counterexample trace. 

However, to prevent the need to explore both options, our algorithm maintains the 
following invariant for every candidate self composition function f that it constructs: 


Claim. Every abstract state that is not in V ,, Cm (B) is not reachable w.r.t. the abstract 
composed program of any composition function that is part of a composition-invariant 
pair for T and (pre, post). 


This property clearly holds for the lock-step composition function, which the algorithm 
starts with, since for this composition, V M Cm (B) = true. As we explain in Corol- 
lary 2, it continues to hold throughout the algorithm. 

As a result of this property, whenever a candidate composition function f does not 
satisfy condition S1 or S2, it is never the case that \/ ,, Cm (B) needs to be extended 
to allow the abstract states in cex to be reachable. Instead, the abstract counterexample 
obtained in violation of the conditions needs to be eliminated by modifying f. 

Let cex = 81,..., ŝm+1 be an abstract counterexample of Ap (Tf ) such that §) = 
Ppre(B) and §mii1 E ea pri(B)) A 7Gpos(B) (violating $1) or §m41 = Unreach 
(violating S2). Any self composition f’ that agrees with f on the states in y(8;) for every 
8; that appears in cer has the same transitions in Rf and, hence, the same transitions 
in R. It, therefore, exhibits the same abstract counterexample in Ap(T! ). Hence, it 
violates S1 or S2 and is not part of any composition-invariant pair. 


Notation. Recall that f is defined via conditions Cy, € Lp. This ensures that for every 
abstract state 8, f is defined in the same way for all the states in y($). We denote the 
value of f on the states in y(5) by f (8) (in particular, f() may be undefined). We get 
that f(8) = M if and only if § E Cm (B). 


Using this notation, to eliminate the abstract counterexample cez, one needs to elimi- 
nate at least one of the transitions in cex by changing the definition of f(8;) for some 
1 <i < m. For a new candidate function f’ this may be encoded by the disjunctive 
constraint \/;"_, f’(5:) # f(8;). However, we observe that a stronger requirement may 
be derived from cez based on the following lemma: 


Lemma 5. Let f be a self composition function and cex = §1,...,Sm41 a coun- 
terexample trace in Ap(T‘) such that 8  Qpre(B) but êm+1 = (hes pri(B)) A 
=Ppost(B) or 8m41 = Unreach. Then for any self composition function f' such that 


f'(8m) = (8m), if 8m is reachable in Ap(T!') from Ppre(B), then a counterexample 
trace to S1 or S2 exists. 


Corollary 1. If there exists a composition-invariant pair (f', Inv’), then there is also 


one where f' (8m) # f(8m)- 
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Therefore, we require that in the next self composition candidates the abstract state 
êm must not be mapped to its current value in f, i.e., f’(8m) Æ M, where f (êm) = M°. 
Algorithm 1 accumulates these constraints in the set E£ (Line 6). Formally, the con- 
straint (8, M) € E asserts that Ch, must imply “(A s6,)=1 PANs(b,)=0 ap), and hence 


P(aeM. 


Identifying Abstract States that Must Be Unreachable. A new candidate self com- 
position is constructed such that it satisfies all the constraints in E (thus ensuring that no 
abstract counterexample will re-appear). In the construction, we make sure to satisfy S3 
(fairness). Therefore, for every abstract state §, we choose a value f’(&) that satisfies the 
constraints in E and is non-starving: a value M is starving for § if § = Viz ay Fi (B) 
but § E Vem “Yrs (B), i.e., some of the copies have not terminated in § but none of 
the non-terminating copies is scheduled. (Due to adequacy, a value M is starving for § 
if and only if it is starving for every sl! € ¥(8).) 

If for some abstract state $, all the non-starving values have already been excluded 
(i.e., (8, M) € E for every non-starving M), we conclude that there is no f’ such that 
8 is reachable in Ap (Tf ‘) and f’ is part of a composition-invariant pair: 


Lemma 6. Let ê € Ê be an abstract state such that for every 0 £ M C {1..k} either 
M is starving for 8 or (8,M) € E. Then, for every f' that satisfies S3, if Ap(T! ) 
satisfies S1 and S2, then ŝ is unreachable in Ap(T ). 


Corollary 2. If there exists a composition-invariant pair ( f’, Inv’), then 8 is unreach- 
able in Ap(T’ ). 


This is because no matter how the self composition function f’ would be defined, & is 
guaranteed to have an outgoing abstract counterexample trace in Ap (Tf J, 

We, therefore, turn f’(5) to be undefined. As a result, condition S2 of Algorithm 4 
requires that § will be unreachable in Ap(T/ Ji In Algorithm 1, this is enforced by 
adding 5 to Unreach (Line 8). 

Every abstract state s that is added to Unreach is a strengthening of the safety prop- 
erty by an additional constraint that needs to be obeyed in any composition-invariant 
pair, where obtaining a composition-invariant pair is the target of the algorithm. This 
makes our algorithm property directed. 

If an abstract state that satisfies Ypre (B) is added to Unreach, then Algorithm | deter- 
mines that no solution exists (Line 9). Otherwise, it generates a new constraint for Æ 
based on the abstract state preceding s in the abstract counterexample (Line 12). 


Constructing the Next Candidate Self Composition Function. Given the set of con- 
straints in Æ and the formula Unreach, Modi fy-SC (Line 13) generates the next candi- 
date composition function by (i) taking a constraint (8, M) such that § + Unreach (typ- 
ically the one that was added last), (ii) selecting a non-starving value Mnew for § (such 


é If the conditions {Cm }m defining f may overlap, we consider the condition Caz by which 
the transition from $m to §m+1 was defined. 
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a value must exist, otherwise § would have been added to Unreach), and (iii) updating 
the conditions defining f’ as follows: 
Cy = Cu A 78(P) Cien = (CMe V 8(P)) 

The conditions of other values remain as before. This definition is facilitated by the fact 
that the same set of predicates is used both for defining f’ and for defining the abstract 
states ê € Ê (by which Inv is obtained). Note that in practice we do not explicitly 
turn f’ to be undefined for y(Unreach). However, these definitions are ignored. The 
definition ensures that f’ is non-starving (satisfying condition S3) and that no two con- 
ditions Cw, # Chy, overlap. While the latter is not required, it also does not restrict 
the generality of the approach (since the language we consider is closed under Boolean 
operations). 


Theorem 2. Let T be a transition system, (pre, post) a k-safety property and P a set of 
predicates over ylik, If Algorithm 1 returns “no solution” then there is no composition- 
invariant pair for T and (pre, post) in Lp. Otherwise, (f, Inv(P)) returned by Algo- 
rithm 1 is a composition-invariant pair in Lp, and thus T = (pre, post). 


Complexity. Each iteration of Algorithm 1 adds at least one constraint to F, excluding 
a potential value for f over some abstract state §. An excluded values is never re-used. 
Hence, the number of iterations is at most the number of abstract states, 21Pl , multiplied 
by the number of potential values for each abstract state, n = 2". Altogether, the num- 
ber of iterations is at most O(2!”! . 2"), Each iteration makes one call to Abs_Reach 
which checks reachability via predicate abstraction, hence, assuming that satisfiability 
checks in the original logic are at most exponential, its complexity is 2°‘'?!). Therefore, 
the overall complexity of the algorithm is 20(/?!)+*, Typically, k is a small constant, 
hence the complexity is dominated by 20{'?!), 


5 Evaluation and Conclusion 


Implementation. We implemented PDSC (Algorithm 1) in Python on top of Z3 [25]. Its 
input is a transition system encoded by Constrained Horn Clauses (CHC) in SMT2 for- 
mat, a k-safety property and a set of predicates. The abstraction is implicitly encoded 
using the approach of [6], and is parameterized by a composition function that is mod- 
ified in each iteration. For reachability checks (Abs_Reach) we use SPACER [22], 
which supports LRA and arrays. For the set of predicates used by PDSC, we imple- 
mented an automatic procedure that mines these predicates from the CHC. Additional 
predicates may be added manually. 


Experiments. To evaluate PDSC, we compare it to SYNONYM [26], the current state of 
the art in k-safety verification. 

To show the effectiveness of PDSC, we consider examples that require a nontrivial 
composition (these examples are detailed in [29]). We emphasize that the motivation for 
these example is originated in real-life scenarios. For example, Fig. | follows a pattern 
of constant-time execution. The results of these experiments are summarized in Table 1. 
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Table 1. Examples that require semantic compositions 


Program PDSC SYNONYM | 

Time(s) | Iteations : 
DoubleSquareNI | 7 33 fail aoe $ 
HalfSquareNI | 3.4 28 fail Rel ‘ 
ArrayIntMod | 58.2168 fail ee 
SquaresSum 2.8 4 fail Fig.2. Runtime comparison (in sec.): 
ArrayInsert 19.5 102 fail PDSC (x-axis) and SYNONYM (y-axis). 


PDSC is able to find the right composition function and prove all of the examples, while 
SYNONYM cannot verify any of them. We emphasize that for these examples, lock-step 
composition is not sufficient. However, PDSC infers a composition that depends on the 
programs’ state (variable values), rather than just program locations. 

Next we consider Java programs from [26,30], which we manually converted to C, 
and then converted to CHC using SEAHORN [19]. For all but 3 examples, only 2 types 
of predicates, which we mined automatically, were sufficient for verification: (i) rela- 
tional predicates derived from the pre- and post-conditions, and (ii) for simple loops that 
have an index variable (e.g., for iterating over an array), an equality predicate between 
the copies of the indices. These predicates were sufficient since we used a large-step 
encoding of the transition relation, hence the abstraction via predicates takes effect only 
at cut-points. For the remaining 3 examples, we manually added 2—4 predicates. With 
the exception of 1 example where a timeout of 10 seconds was reached, all examples 
were solved with a lock-step composition function. Yet, we include them to show that 
on examples with simple compositions PDSC performs similarly to SYNONYM. This 
can be seen in Fig. 2. 


Conclusion and Future Work. This work formulates the problem of inferring a self 
composition function together with an inductive invariant for the composed program, 
thus capturing the interplay between the self composition and the difficulty of verify- 
ing the resulting composed program. To address this problem we present PDSC— an 
algorithm for inferring a semantic self composition, directed at verifying the composed 
program with a given language of predicates. We show that PDSC manages to find non- 
trivial self compositions that are beyond reach of existing tools. In future work, we are 
interested in further improving PDSC by extending it with additional (possibly lazy) 
predicate discovery abilities. This has the potential to both improve performance and 
verify properties over wider range of programs. Additionally, we consider exploring 
further generalization techniques during the inference procedure. 
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Abstract. Stochastic multiplayer games (SMGs) have gained attention 
in the field of strategy synthesis for multi-agent reactive systems. How- 
ever, standard SMGs are limited to modeling systems where all agents 
have full knowledge of the state of the game. In this paper, we intro- 
duce delayed-action games (DAGs) formalism that simulates hidden- 
information games (HIGs) as SMGs, where hidden information is cap- 
tured by delaying a player’s actions. The elimination of private vari- 
ables enables the usage of SMG off-the-shelf model checkers to implement 
HIGs. Furthermore, we demonstrate how a DAG can be decomposed into 
subgames that can be independently explored, utilizing parallel compu- 
tation to reduce the model checking time, while alleviating the state 
space explosion problem that SMGs are notorious for. In addition, we 
propose a DAG-based framework for strategy synthesis and analysis. 
Finally, we demonstrate applicability of the DAG-based synthesis frame- 
work on a case study of a human-on-the-loop unmanned-aerial vehicle 
system under stealthy attacks, where the proposed framework is used to 
formally model, analyze and synthesize security-aware strategies for the 
system. 


1 Introduction 


Stochastic multiplayer games (SMGs) are used to model reactive systems where 
nondeterministic decisions are made by multiple players [4, 13,23]. SMGs extend 
probabilistic automata by assigning a player to each choice to be made in the 
game. This extension enables modeling of complex systems where the behavior of 
players is unknown at design time. The strategy synthesis problem aims to find a 
winning strategy, i.e., a strategy that guarantees that a set of objectives (or win- 
ning conditions) is satisfied [6,21]. Algorithms for synthesis include, for instance, 
value iteration and strategy iteration techniques, where multiple reward-based 
objectives are satisfied [2,9,17]. To tackle the state-space explosion problem, 
[29] presents an assume-guarantee synthesis framework that relies on synthesiz- 
ing strategies on the component level first, before composing them into a global 
winning strategy. Mean-payoffs and ratio rewards are further investigated in [3] 
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to synthesize ¢-optimal strategies. Formal tools that support strategy synthesis 
via SMGs include PRISM-games [7,19] and Uppaal Stratego [10]. 

SMGs are classified based on the number of players that can make choices 
at each state. In concurrent games, more than one player is allowed to concur- 
rently make choices at a given state. Conversely, turn-based games assign one 
player at most to each state. Another classification considers the information 
available to different players across the game [27]. Complete-information games 
(also known as perfect-information games [5]) grant all players complete access 
to the information within the game. In symmetric games, some information is 
equally hidden from all players. On the contrary, asymmetric games allow some 
players to have access to more information than the others [27]. 

This work is motivated by security-aware systems in which stealthy adversar- 
ial actions are potentially hidden from the system, where the latter can proba- 
bilistically and intermittently gain full knowledge about the current state. While 
hidden-information games (HIGs) can be used to model such systems by using 
private variables to capture hidden information [5], standard model checkers can 
only synthesize strategies for (full-information) SMGs; thus, demanding for alter- 
native representations. The equivalence between turn-based semi-perfect infor- 
mation games and concurrent perfect-information games was shown [5]. Since 
a player’s strategy mainly rely on full knowledge of the game state [9], using 
SMGs for synthesis produces strategies that may violate synthesis specifica- 
tions in cases where required information is hidden from the player. Partially- 
observable stochastic games (POSGs) allow agents to have different belief states 
by incorporating uncertainty about both the current state and adversarial plans 
[15]. Techniques such as active sensing for online replanning [14] and grid-based 
abstractions of belief spaces [24] were proposed to mitigate synthesis complex- 
ity arising from partial observability. The notion of delaying actions has been 
studied as means for gaining information about a game to improve future strate- 
gies [18,30], but was not deployed as means for hiding information. 

To this end, we introduce delayed-action games (DAGs)—a new class of 
games that simulate HIGs, where information is hidden from one player by 
delaying the actions of the others. The omission of private variables enables the 
use of off-the-shelf tools to implement and analyze DAG-based models. We show 
how DAGs (under some mild and practical assumptions) can be decomposed 
into subgames that can be independently explored, reducing the time required 
for synthesis by employing parallel computation. Moreover, we propose a DAG- 
based framework for strategy synthesis and analysis of security-aware systems. 
Finally, we demonstrate the framework’s applicability through a case study of 
security-aware planning for an unmanned-aerial vehicle (UAV) system prone to 
stealthy cyber attacks, where we develop a DAG-based system model and further 
synthesize strategies with strong probabilistic security guarantees. 

The paper is organized as follows. Section 2 presents SMGs, HIGs, and prob- 
lem formulation. In Sect.3, we introduce DAGs and show that they can sim- 
ulate HIGs. Section 4 proposes a DAG-based synthesis framework, which we 
use for security-aware planning for UAVs in Sect. 5, before concluding the paper 
in Sect. 6. 
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2 Stochastic Games 


In this section, we present turn-based stochastic games, which assume that all 
players have full information about the game state. We then introduce hidden- 
information games and their private-variable semantics. 


Notation. We use No to denote the set of non-negative integers. P(A) denotes 
the powerset of A (i.e., 24). A variable v has a set of valuations Ev (v), where 
n (v) € Ev (v) denotes one. We use X* to denote the set of all finite words over 
alphabet X, including the empty word e. The mapping Eff : X* x Ev (v) > Ev (v) 
indicates the effect of a finite word on 7 (v). Finally, for general indexing, we use 
s; or s, for i € No, while PL, denotes Player y. 


Turn-Based Stochastic Games (SMGs). SMGs can be used to model reac- 
tive systems that undergo both stochastic and nondeterministic transitions from 
one state to another. In a turn-based game,’ actions can be taken at any state 
by at most one player. Formally, an SMG can be defined as follows [1,28, 29]. 


Definition 1 (Turn-Based Stochastic Game). A turn-based game (SMG) 
with players I = {1,II, O} is a tuple G = (S, (S1, Si, SO), A, so, ô), where 


— S is a finite set of states, partitioned into S1, Str and So; 

- A=A,UAry U {T} is a finite set of actions where T is an empty action; 

— so E€ Sy is the initial state; and 

-~6:Sx Ax S — [0,1] is a transition function, such that 6(s,a,s’) € {1,0}, 
Vs € StU Sp,a € A and 5 € S, and ô(s,T,s') € [0,1], Vs € So and 
s’! € S1 U Su, where Desi, 6(8,7, 8’) = 1 holds. 


For all s€ S1U Su and a € ArU An, we write s+ s’ if 6(s,a, 8’) =1. Similarly, for 
all s € Sc we write s—> s' if s' is randomly sampled with probability p=0(s,7, s’). 


Hidden-Information Games. SMGs assume that all players have full knowl- 
edge of the current state, and hence provide perfect-information models [5]. In 
many applications, however, this assumption may not hold. A great example 
are security-aware models where stealthy adversarial actions can be hidden from 
the system; e.g., the system may not even be aware that it is under attack. 
On the other hand, hidden-information games (HIGs) refer to games where one 
player does not have complete access to (or knowledge of) the current state. 
The notion of hidden information can be formalized with the use of private vari- 
ables (PVs) [5]. Specifically, a game state can be encoded using variables vy and 
Up, representing the true information, which is only known to PLi, and PLu 
belief, respectively. 


1 The term turn-based indicates that at any state only one player can play an action. 
It does not necessarily imply that players take fair turns. 
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Definition 2 (Hidden-Information Game). A hidden-information stochas- 
tic game (HIG) with players I = {1, II, O} over a set of variables V = {vr, up} 
is a tuple Gu = (S, (Si, Str, So), A, SQ; b, ô), where 


- set of states S C Ev (vr)xEv (vg)xP (Ev (vr))xI, partitioned in S1, St, So; 

- A= A,;UAyU{r, 0} is a finite set of actions, where T denotes an empty action, 
and 0 is the action capturing PLy attempt to reveal the true value ur; 

— so € Sy is the initial state; 

- B: Ay > P(Ar) is a function that defines the set of available PL actions, 
based on PLy action; and 

- ô: S x Ax S — [0,1] is a transition function such that 6(s1,a,80) = 
ô(so,a,si) = 0, and ô(s1n,0,so), Ôls1,a, sr), 5(s1,@, s) € {0,1} for all 
sı E€ St, su E Str, so E So and a € A, where Vesa lso, T, s’) =1. 


In the above definition, ô only allows transitions sg to sī, S to sj or so, 
with sr to so conditioned by action 0, and probabilistic transitions so to srr. 
A game state can be written as s = (t, u, 2,7), but to simplify notation we use 
sy (t, u, 2) instead, where t € Ev (vr) is the true value of the game, u € Ev (vg) 
is PLy current belief, Q € P(Ev (vr)) \ {0} is PLy belief space, and y € T is 
the current player’s index. When the truth is hidden from PLyy, the belief space 
Q is the information set [27], capturing PLy knowledge about the possible true 
values. 


Example 1 (Belief vs. True Value). Our 
motivating example is a system that con- 
sists of a UAV and a human operator. For 
localization, the UAV mainly relies on a Al So <> ples bx 
GPS sensor that can be compromised to 
effectively steer the UAV away from its 
original path. While aggressive attacks can 
be detected, some may remain stealthy by 
introducing only bounded errors at each 
step [16,20,22,26]. For example, Fig. 1 shows a UAV (PL) occupying zone A 
and flying north (N). An adversary (PL;) can launch a stealthy attack targeting 
its GPS, introducing a bounded error (NE, NW) to remain stealthy. The set of 
stealthy actions available to the attacker depends on the preceding UAV action, 
which is captured by the function 3, where 6(N)={NE,N, NW}. Being unaware 
of the attack, the UAV believes that it is entering zone C, while the true new 
location is D due to the attack (NE). Initially, 7 (vr) =n (vg) =za, and M={za} 
as the UAV is certain it is in zone z4. In s2, ņn (vg) = zc, yet n (vr) = Zp. 
Although vz is hidden, PLy is aware that ņ (vz) is in Q={zp, zc, zp}. 


Fig. 1. The UAV belief (solid square) 
vs. the true value (solid diamond) of 
its location. 


HIG Semantics. Gy semantics is described using the rules shown in Fig. 2, 
where H2 and H3 capture PLy and PL; moves, respectively. The rule H4 specifies 
that a PLy attempt 0 to reveal the true value can succeed with probability p; 
where PLi belief is updated (i.e., u’ = t), and remains unchanged otherwise. 
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H1: so=st1(to, uo, 2) if to = uo, Qo = {to} 


H2: sq (t, u, 2) a4 SI (tu, 2) if a;¢ An, t =t, u' = Eff(ai, u), 
Q ={t' |t = Eff(bi, t) Vbi € B(ai),t € Q} 


2 so(t,u', 2) uisi 
H3: sy (t, u, 2) ey su (t',u’, R’) if bi€b (ai), U=Eff(bi, t), u =u, Q'=2 
H4: so (t, u, 2) Pi, sie (tw, 2) if t=t, =t, 2’ ={t}, pi=6(S0,T, sr) 
EEUE (t',u’, Q') if t =t, =u, R=, 1—pi=8(s0,T, sm) 


Fig. 2. Semantic rules for an HIG. 


Example 2 (HIG Semantics). Continuing Example 1, let us assume that the set 
of actions Ay = Ay = {N, S, E, W, NE, NW, SE, SW}, and that 6=GT is a geolo- 
cation task that attempts to reveal the true value of the game.? Now, consider 
the scenario illustrated in Fig. 3. At the initial state so, the UAV attempts to 
move north (N), progressing the game to the state s1, where the adversary takes 
her turn by selecting an action from the set G(N) = {NE, N, NW}. The players 
take turns until the UAV performs a geolocation task GT, moving from the state 
s4 to s5. With probability p = 6(s5,7, sẹ), the UAV detects its true location 
and updates its belief accordingly (i.e., to sg). Otherwise, the belief remains the 
same (i.e., equal to s4). 


Fo a e a fe oe oe eb te es iat eet 
a oo oE p Boa gp TE 4 EEEE 2 are | on se 
| Serer) E E amp | L L id een el ee ee a eee Oe) S teat EE DE DEE SD sea deat bosceia ead eee. 
1 1 1 nl I I \ i | L @ tot 1 1 @ tot | 1 1 S i 1 1 a | 1 1 1 
a _—= al -#---! ieeccellin oo ees (cee ony = eee D Lethe: BB eee Wi lh eel ceca ah: 
oo «Ss th oP @ EE EEE PO POPEO EE EE Mim BE L £] 
fi i MA i ii i i iof I I 1 | 1 I It 1 I It I I ii i I 1 
-li 1 1 il | 1 ia 1 1 1 1 to 1 1 n 1 1 to 1 1 1 
1 0 1 


Fig. 3. An example of the UAV motion in a 2D-grid map, modeled as an HIG. Solid 
squares represent the UAV belief, while solid diamonds represent the ground truth. 
The UAV action GT denotes performing a geolocation task. 


Problem Formulation. Following the system described in Example 2, we 
now consider the composed HIG Gy = Maav||Muav|| Mas shown in Fig. 4; the 
HIG-based model incorporates standard models of a UAV (Myay,), an adver- 
sary (Maav), and a geolocation-task advisory system (Mas) (e.g., as introduced 
in [11,12]). Here, the probability of a successful detection p(vr, vg) is a function 
of both the location the UAV believes to be its current location (vg) as well 


? A geolocation task is an attempt to localize the UAV by examining its camera feed. 
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as the ground truth location that the UAV actually occupies (vr). Reasoning 
about the flight plan using such model becomes problematic since the ground 
truth vz is inherently unknown to the UAV (i.e., PLyy), and thus so is p(vz, vg). 
Furthermore, such representation, where some information is hidden, is not sup- 
ported by off-the-shelf SMG model checkers. Consequently, for such HIGs, our 
goal is to find an alternative representation that is suitable for strategy synthesis 
using off-the-shelf SMG model-checkers. 


pl = uav 


(a) Legend 
fly(dg), dg E€ Auav 
Xp = Xp + A(dg) guard 
Maav pl := adv channel transmit (!) 
channel receive (?) 
pl = adv assignment 
ESTAS TRATTA Sars d O state 
"op sae ae we j = transition 
CoO os a 
Madv E a i O i NK, 


Fig. 4. An example of an HIG-based system model comprised of the UAV (Muay), the 
adversary (Maav), and the AS (Mas). Framed information is hidden from the UAV-AS. 


3 Delayed-Action Games 


In this section, we provide an alternative representation of HIGs that eliminates 
the use of private variables—we introduce Delayed-Action Games (DAGs) that 
exploit the notion of delayed actions. Furthermore, we show that for any HIG, 
a DAG that simulates the former can be constructed. 


Delayed Actions. Informally, a DAG reconstructs an HIG such that actions 
of PLi (the player with access to perfect information) follow the actions of PLi, 
i.e., PL; actions are delayed. This rearrangement of the players’ actions provides 
a means to hide information from PLy without the use of private variables, 
since in this case, at PLy states, PL; actions have not occurred yet. In this 
way, PLy can act as though she has complete information at the moment she 
makes her decision, as the future state has not yet happened and so cannot 
be known. In essence, the formalism can be seen as a partial ordering of the 
players’ actions, exploiting the (partial) superposition property that a wide class 
of physical systems exhibit. To demonstrate this notion, let us consider DAG 
modeling on our running example. 


Example 8 (Delaying Actions). Figure5 depicts the (HIG-based) scenario from 
Fig. 3, but in the corresponding DAG, where the UAV actions are performed first 
(in 80, §1, 82), followed by the adversary delayed actions (in 83, 84). Note that, 
in the DAG model, at the time the UAV executed its actions (89, 1, 82) the 
adversary actions had not occurred (yet). Moreover, 8) and 8 (Fig.5) share 
the same belief and true values as sọ and sẹ (Fig.3), respectively, though the 
transient states do not exactly match. This will be used to show the relationship 
between the games. 
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Fig. 5. The same scenario as in Fig. 3, modeled as a DAG. Solid squares represent UAV 
belief, while solid diamonds represent the ground truth. The UAV action GT denotes 
performing a geolocation task. 


The advantage of this approach is twofold. First, the elimination of private 
variables enables simulation of an HIG using a full-information game. Thus, 
the formulation of the strategy synthesis problem using off-the-shelf SMG-based 
tools becomes feasible. In particular, a PLy synthesized strategy becomes depen- 
dent on the knowledge of PL; behavior (possible actions), rather than the specific 
(hidden) actions. We formalize a DAG as follows. 


Definition 3 (Delayed-Action Game). A DAG of an HIG Gy = (S, (S1, 
Si, SO), A, so, 6, ô), with players I = {1,1I,O} over a set of variables V = 
{vr, up} is a tuple Gp = (S, (S1, S11, SO), A, 80, 8,6) where 


- $C Ev (ur) x Ev (vg) x Aj, x No x I’ is the set of states, partitioned into 
St, Str and So; 

- & € Ŝi is the initial state; and 

- ô: x Ax — [0,1] is a transition function such that 6(8y, a, ŝo) = 
ô(ŝ1,a, 81) = ô(ŝ0;a, 81) =0, and d(81, a, ŝu) € {0, 1}, 6(S11, 0, 81) € {0, 1}, 
6(8t, a, 81) E {0, 1}, 4(81, a, ŝo) E {0, 1}, for all ŝ& € Ŝi, ŝu € Su, 50 € So 
anda E€ A, where X` yeg, (80,4, 8) =1. 


Note that, in contrast to transition function 5 in HIG Gy, 6 in DAG Gp only 
allows transitions S11 to ŝr or Sy, as well as S to ôr or 86, and probabilistic 
transitions ŝo to Sq; also note that S17 to 8 is conditioned by the action 0. 


DAG Semantics. A DAG state is a tuple §= (t, a, w, J; y), which for simplicity 
we shorthand as ê, (t, û, w, j), where Ê € Ev (vr) is the last known true value, 
û € Ev (vg) is PLy belief, w € Af, captures PLy actions taken since the last 
known true value, 7 € No is an index on w, and y € I is the current player 
index. The game transitions are defined using the semantic rules from Fig. 6. 
Note that PLy can execute multiple moves (i.e., actions) before executing @ to 
attempt to reveal the true value (D2), moving to a PL; state where PL, executes 
all her delayed actions before reaching a ‘revealing’ state ŝo (D3). Finally, the 
revealing attempt can succeed with probability p; where PLy belief is updated 
(i.e., a’ =), or otherwise remains unchanged (D4). 
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D1: ŝo = ŝir (Êo, tio, wo, 0) if to = tio, Wo = € 
D2: Su (t, ù, w,0) 5 SI (2,0, 0) if ai E€ An, =t, a! = Efflai, 0), w =wa; 
£, a(, a’, w’, 0) if =i, a =a, w =w 


D3: s1(é, û, w, j) 25 a(t’, a’, w, j+1) if bi €B(w;), C= Eff (bi, t), &'=û, w=w, j<|w|-1 


D4: ŝo (ê, tu, w, j) —> SII 


Fig. 6. Semantic rules for DAGs. 


In both Gy and Gp, we label states where all players have full knowledge of 
the current state as proper. We also say that two states are similar if they agree 
on the belief, and equivalent if they agree on both the belief and ground truth. 


Definition 4 (States). Let s,(t,u,Q) € S and 84(t,a,w,j) € S. We say: 


— sq is proper iff 2 = {t}, denoted by sy E€ Prop(Gu). 

- 8 is proper iff w =e, denoted by 84 € Prop(Gp). 

- s, and 8» are similar iff i = u, t € 2, and y =%, denoted by sy ~ 84. 

- Sy, 8% are equivalent iff t = Ê, u = û, w =e, andy = Ẹ, denoted by Sy Ba. 


From the above definition, we have that s ~ § = > s € Prop(Gu), § € Prop(Gp). 
We now define execution fragments, possible progressions from a state to another. 


Definition 5 (Execution Fragment). An execution fragment (of either an 
SMG, DAG or HIG) is a finite sequence of states, actions and probabilities 


i (pi+1) : 
0 = 8901 P1S1A2P282...AnPnSn such that (os evi E 841), Vi > 0.3 


We use first(o) and last(o) to refer to the first and last states of o, respectively. If 


both states are proper, we say that ois proper as well, denoted by @ € Prop(Gu).* 
Moreover, o is deterministic if no probabilities appear in the sequence. 


Definition 6 (Move). A move my of an execution o from state s € o, denoted 
by move,(s,Q), is a sequence of actions ayaz...a; € A> that player y performs 
in o starting from s. 


By omitting the player index we refer to the moves of all players. To simplify 
notation, we use move(g) as a short notation for move(first(g), o). We write 
(m)(first(e)) = last(e) to denote that the execution of move m from the first(o) 
leads to the last(g). This allows us to now define the delay operator as follows. 


3 For deterministic transitions, p = 1, hence omitted from g for readability. 
4 An execution fragment lives in the transition system (TS), i.e., @ € Prop(TS(G)). 
We omit TS for readability. 
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Definition 7 (Delay Operator). For an Gy, let m = move(o) = 
a1b1...Gnb,0 be a move for some deterministic 0 E€ TS(Gu), where a...an €E 
Ajj, b1...bn E Af. The delay operator, denoted by M, is defined by the rule 
mM = Q1 . . . anbbı ... bn. 


Intuitively, the delay operator shifts PL; actions to the right of PLy actions up 
until the next probabilistic state. For example, 


i 0) %1 1) 2 2) @ 3) P3 4) 24 5) 25 6) 26 7) è7 8 
it pa Af Be By oD A oD Ps lr Hy of He of“ 0 “aD 
then m= ay bo fi] iR a4 bs a6 b7, 
and m = ay 0 ms 5, F: a4 as < bs bz 


Simulation Relation. Given an HIG Gy, we first define the corresponding 
DAG Gp. 


Definition 8 (Correspondence). Given an HIG Gu, a corresponding DAG 
Gp = D[Gu] is a DAG that follows the semantic rules displayed in Fig. 7. 


So = sir (to, uo, No) ŝo = 811(to, ûo, wo, 0) s.t. to =to, to = Uo 


su (t, u, 2) kar (tu, R’) i, û, w, 0) Bay (ie, w’, 0) s.t. û=u 


toatl 


9% a « Oi y ie 
su (tu, 2) — so (tu, Q’) t, a, w, 0) —> SI (é „Ô ,w’,0) s.t. G=u 


b; ~ bi aoa : PN P 

sı (t, u, 2) — su (t’, wu’, 2’) „w, j) — sI (i, a’, w, j+) s.t. t=t,j < |w] 
bi A ph S ~ bi ap on F A . 

sı (t, u, 2) — su (t', u’, Q’) ŝi (t, û, w, j) —> so (E, a’, w, j) s.t. t=t,j=|w| 


so (t, u, 2) Bey su (t’, u’, 2’) 


Jd) dd dd 


l—pi 
so (t, u, 2) -Pi on (tu, R’) 


Fig. 7. Semantic rules for HIG-to-DAG transformation. 


For the rest of this section, we consider Gp = D[Gu], and use o € TS(Gy) and 
ô € TS(Gp) to denote two execution fragments of the HIG and DAG, respec- 
tively. We say that ọ and ô are similar, denoted by o ~ 6, iff first(e) ~ first(ô), 
last(@) ~ last(ô), and move(o) = move (ô). 


Definition 9 (Game Proper Simulation). A game Gp properly simulates 
Gu, denoted by Go ~> Gu, iff Vo € Prop(Gu), 46 € Prop(Gp) such that o ~ ô. 


Before proving the existence of the simulation relation, we first show that if a 
move is executed on two equivalent states, then the terminal states are similar. 


Lemma 1 (Terminal States Similarity). For any so œ ŝo and a determin- 
istic oE TS(Gu) where first(0)= so, last(@) € St, then last(o)~ (move(o)) (o) 
holds. 
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ĝi 
where move(o;) = a1b1...a;b;0. We then write move(o) = a1...a;0b1...b;. We use 
induction over 7 as follows: 


Proof. Let last(oi) = sË (ti, ui, i) and (move(o:)) (80) = 3 i, ta, wis ji), 


— Base (i=0): 09 =89 —> 8 ~ 3 where ug = tig and to =ĉo. 

— Induction (i > 0): Assume that the claim holds for move(oj-1) = ai 
by...a;_-10;_19, i.e., Uj1 = ty_1 and fi € Ni—ı. For Qi Wwe have that 
Uui = Eff(ai, ui—1) and t; = Eff(ai, ûi—1). Also, t= Eff(b;, ti—1) E€ Ni and 
ti = Eff (bi, ĉi—1). Hence, ui = ti, t; € Qi and GV=%=O. Thus, 3 ~ g@) 
holds. The same can be shown for move(e) = a 0,...a;b; where no 6 
occurs. 


Theorem 1 (Probabilistic Simulation). For any so ~ ŝo and o € Prop(Gu) 
where first(e@) = so, it holds that 


Pr [last(@) = s’] = Pr | (move(o)) (80) = 3" Wee st. SF. 


Proof. We can rewrite o as 9 = ôo (ae ‘in a s, where 00, 01,---; Qn—1 
are deterministic. Let first(9;) = s (ti, ui, 2), last(9;) = soa, 2), and 


(move(o)) (8) =8(" (Ên, ûn, Wn; jn). We use induction over n as follows: 


— Base (n=0): for o to be deterministic and proper, o= 99 =s% holds. 

— Case (n= 1): pı = p(to, up). From Lemma 1, é& = u1 and tı = tı. Hence, 
Pr [last(o)= sfp] = Pr | (move(o)) (0) =fr] = p(t, up) and sl) ~ sD, 

— Induction (n > 1): It is straightforward to infer that pn =p (e ul J; hence 


n—1l) “n-1 


Pr [last(o)= sfp] = Pr | (moveto) J (0) =a] = P, and gf) ~ al”), 


Note that in case of multiple 0 attempts, the above probability P satisfies 


n Mi 


P= J[X pi (Hasta) O- pi (Hitha), 


i=1 j=1 


where m; is the number of 0 attempts at stage i. Finally, since Theorem 1 imposes 
no constraints on move(o), a DAG can simulate all proper executions that exist 
in the corresponding HIG. 


Theorem 2 (DAG-HIG Simulation). For any HIG Gu there exists a DAG 
Go = D[Gu] such that Go ~ Gu (as defined in Definition 9). 


4 Properties of DAG and DAG-based Synthesis 


We here discuss DAG features, including how it can be decomposed into sub- 
games by restricting the simulation to finite executions, and the preservation of 
safety properties, before proposing a DAG-based synthesis framework. 
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Transitions. In DAGs, nondeterministic actions of different players under- 
line different semantics. Specifically, PL; nondeterminism captures what is 
known about the adversarial behavior, rather than exact actions, where PL, 
actions are constrained by the earlier PLy action. Conversely, PLy nondeter- 
minism abstracts the player’s decisions. This distinction reflects how DAGs can 
be used for strategy synthesis under hidden information. To illustrate this, sup- 
pose that a strategy Tyr is to be obtained based on a worst-case scenario. In that 
case, the game is explored for all possible adversarial behaviors. Yet, if a strat- 
egy Ty is known about PLy, a counter strategy my can be found by constructing 
Go: 

i Probabilistic behaviors in DAGs are captured by PLO, which is character- 
ized by the transition function ô: So x Si — [0,1]. The specific definition 
of ô depends on the modeled system. For instance, if the transition function 
(i.e., the probability) is state-independent, i.e., ôlŝo, S11) = cc € [0,1], the 
obtained model becomes trivial. Yet, with a state-dependent transition func- 
tion, i.e., (50, 8m) = p(t, û), the probability that PLy successfully reveals the 
true value depends on both the belief and the true value, and the transition 
function can then be realized since ŝœ holds both Ê and a. 


Decomposition. Consider an execution 6* = 89a18 a2... that describes a 
scenario where PLy performs infinitely many actions with no attempt to reveal 
the true value. To simulate 6*, the word w needs to infinitely grow. Since we 
are interested in finite executions, we impose stopping criteria on the DAG, 
such that the game is trapped whenever |w| = hmax is true, where hmax € N 
is an upper horizon. We formalize the stopping criteria as a deterministic finite 
automaton (DFA) that, when composed with the DAG, traps the game whenever 
the stopping criteria hold. Note that imposing an upper horizon by itself is not a 
sufficient criterion for a DAG to be considered a stopping game [8]. Conversely, 
consider a proper (and hence finite) execution ô = 59a,...8’, where 49,8’ € 
Prop(Gp). From Definition 9, it follows that a DAG initial state is strictly proper, 
i.e., ŝo E€ Prop(Gp). Hence, when &’ is reached, the game can be seen as if it is 
repeated with a new initial state &’. Consequently, a DAG game (complemented 
with stopping criteria) can be decomposed into a (possibly infinite) countable 
set of subgames that have the same structure yet different initial states. 


Definition 10 (DAG Subgames). The subgames of a Gp are defined by the 


w {62 = (80.6.8...) set}. abn = U8 
ô, =i, 5) Vy € I; and 3) = 3) s.t. ra E€ Prop(g® NG D 3) Vi, j € No. 


Intuitively, each subgame either reaches a proper state (representing the ini- 
tial state of another subgame) or terminates by an upper horizon. This decompo- 
sition allows for the independent (and parallel) analysis of individual subgames, 
drastically reducing both the time required for synthesis and the explored state 
space, and hence improving scalability. An example of this decompositional app- 
roach is provided in Sect. 5. 
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Preservation of Safety Properties. In DAGs, the action @ denotes a transi- 
tion from PLy; to PL; states and thus the execution of any delayed actions. While 
this action can simply describe a revealing attempt, it can also serve as a what-if 
analysis of how the true value may evolve at stage i of a subgame. We refer to an 
execution of the second type as a hypothetical branch, where Hyp(ô, h) denotes 
the set of hypothetical branches from 6 at stage h € {1,...,n}. Let Lsafe(s) be 
a labeling function denoting if a state is safe. The formula ®,,5. := [G safe] is 
satisfied by an execution o in HIG iff all s(t, u, 2) € o are safe. 

Now, consider 6 of the DAG, with 6 ~ o. We identify the following three cases: 


(a) Lsafe(s) depends only on the belief u, then o F Psafe iff all Szy € ô are safe; 

(b) Lgate(s) depends only on the true value t, then o H Bsafe iff all §; € Hyp(ô, n) 
are safe; and 

(c) Lsafe(s) depends on both the true value ¢ and belief u, then o į} 
Pate iff last(ôn) is safe for all 6, € Hyp(é,h),h € {1,...,n}, where n is 
the number of PLyy actions. 


Taking into account such relations, both safety (e.g., never encounter a hazard) 
and distance-based requirements (e.g., never exceed a subgame horizon) can be 
specified when using DAGs for synthesis, to ensure their satisfaction in the orig- 
inal model. This can be generalized to other reward-based synthesis objectives, 
which will be part of our future efforts that we discuss in Sect. 6. 


Synthesis Framework. We here propose a framework for strategy synthe- 
sis using DAGs, which is summarized in Fig.8. We start by formulating the 
automata Mr, My and Mo, representing PLy, PLy and PLo abstract behav- 
iors, respectively. Next, a FIFO memory stack (m,;)?_, € Af, is implemented 
using two automata Mmra and Mmwr to perform reading and writing opera- 
tions, respectively. The DAG Gp is constructed by following Algorithm 1. The 
game starts with PLy; moves until she executes a revealing attempt 0, allowing 
PL; to play her delayed actions. Once an end criterion is met, the game ter- 
minates, resembling conditions such as ‘running out of fuel’ or ‘reaching map 
boundaries’. 


Model Refinement 


Composition 


Primary Components 
Mi , My, Mo 


- DAG Construction 
Auxiliary Components 


Mmra Mmwr (Algorithm 1) 


Fig. 8. Synthesis and analysis framework based on the use of DAGs. 


5 Specific implementation details are described in Sect. 5. 
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Algorithm 1. Procedure for DAG construction 


Input: Components Mr, Mun, MO, Mmwr, Mmra; initial state ŝo 
Result: DAG Gp 


1 while —(end criterion) do 

2 while a 4 0 do > PLi plays until a revealing attempt 
3 | Mu.vg = Eff(a, vs), Mmwr-write(a, ++wr) 

4 while rd < wr do > PL plays all delayed actions 
5 L Mmra-read(a, ++rd), Mi.vr — EBla), vr) 

6 if draw x ~ Brn(p(vr,vg)) then > PLo plays successful attempt 
7 Mu.vg — Mi.vr, wr — 0, rd — 0 

8 else rd —0 > Unsuccessful attempt, forget PLy actions 


Algorithm 2 describes the procedure for strategy synthesis based on the 
DAG Gp, and an rPATL [6] synthesis query syn that captures, for example, 
a safety requirement. Starting with the initial location, the procedure checks 
whether ¢syn is satisfied if action 0 is performed at stage h, and updates the set 
of feasible strategies IT; for subgame G; until Amax is reached or syn is not satis- 
fied.© Next, the set I; is used to update the list of reachable end locations £ with 
new initial locations of reachable subgames that should be explored. Finally, the 
composition of both Gy and I}; resolves PLy nondeterminism, where the result- 
ing model Gu i ig a Markov Decision Process (MDP) of complete information 
that can be easily used for further analysis. 


5 Case Study 


In this section, we consider a case study where a human operator supervises 
a UAV prone to stealthy attacks on its GPS sensor. The UAV mission is to 
visit a number of targets after being airborne from a known base (initial state), 
while avoiding hazard zones that are known a priori. Moreover, the presence 
of adversarial stealthy attacks via GPS spoofing is assumed. We use the DAG 
framework to synthesize strategies for both the UAV and an operator advisory 
system (AS) that schedules geolocation tasks for the operator. 


Modeling. We model the system as a delayed-action game Gp, where PL; and 
PLy represent the adversary and the UAV-AS coalition, respectively. Figure 9 
shows the model primary and auxiliary components. In the UAV model Muay, 
£g = (Xg, Yg) encodes the UAV belief, and Ayay = {N, S, E,W, NE, NW, SE, SW} 
is the set of available movements. The AS can trigger the action activate 
to initiate a geolocation task, attempting to confirm the current location. 
The adversary behavior is abstracted by Maay where x7 = (xr,y7) encodes 
the UAV true location. The adversarial actions are limited to one directional 


6 Failing to find a strategy at stage i implies the same for all horizons of size j > i. 
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Algorithm 2. Procedure for strategy synthesis 
Input: Initial location (xo, yo), synthesis query ¢syn 
Output: PLi strategies Mi 

1 Le [(x0, yo)], i — 0 


2 while i < |¢| do > Explore all reachable subgames 
3 50 — (€[i], 4fi], €, 0, II), kh — 1, stop — L > Construct initial state 
4 while h < hmax ^ T do > Explore subgame till upper horizon 
5 (m1, p) — Synth (Gz fe beyn) > Synthesize strategy for horizon h 
6 if mu Æ Ø then 

7 | M; — Ii U (a, Th, p), h++ > Save synthesized strategy 
8 else stop — T 

9 Prune (IM), Hi — Mi U i; > Prune subgame strategies 
10 £ — £- (Reachable (I+) \ £), i++ > update reachability 
pl=uav pl=uav | xp = xg + f(d) (pl = adv) A (pl = adv) A pl=as | pl = uav 

activate! fly@! | dg =d (Smrad = Smwr) (Smrad < Smwr) update! | Xp = Xp 

pl = adv d € Ayay check! read! 

\ ae (‘ate es Gy locate? | 2 xB) 


Muay Maav 
pl = uav pl = adv 
write! Xp = xr + g(dr) 
write? write? 


update? m;:=0 update? 


Fig. 9. Primary DAG components: UAV (Muay), adversary (Maav), and AS (Mas). 
Auxiliary DAG components: memory write (Mmwr) and memory read (Mra) mod- 
els, capturing the DAG representation. At stage i, the next memory location to 
write/read is mj. 


increment at most.’ If, for example, the UAV is heading N, then the adver- 
sary set of actions is G(N)={N, NE, NW}. The auxiliary components Mmwr and 
Mira Manage a FIFO memory stack (m;)"29 € A? The last UAV move- 
ment is saved in m; by synchronizing Mmwr with Muay via write, while Mmra 
synchronizes with Maav via read to read the next UAV action from mj. The 
subgame terminates whenever action write is attempted and Mmwr is at state 
n (i.e., out of memory). 

The goal is to find strategies for the UAV-AS coalition based on the following: 


— Target reachability. To overcome cases where targets are unreachable due to 
hazard zones, the label reach is assigned to the set of states with acceptable 
checkpoint locations (including the target) to render the objective incremen- 


T To detect aggressive attacks, techniques from literature (e.g., [16,25,26]) can be 
used. 
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tally feasible. The objective for all encountered subgames is then formalized 
as Prmax |F reach] > pmin for some bound Pmin- 

— Hazard Avoidance. Similar to target reachability, the label hazard is assigned 
to states corresponding to hazard zones. The objective Prmax [G ~hazard] > 
Pmin İs then specified for all encountered subgames. 


By refining the aforementioned objectives, synthesis queries are used for both 
the subgames and the supergame. Specifically, the query 


dsyn(k) = (uav))Prmax=? [hazard US* (locate A reach)] (1) 


is specified for each encountered subgame G;, where locate indicates a successful 
geolocation task. By following Algorithm 2 for a g number of reachable subgames, 
the supergame is reduced to an MDP gibi 
subgames), which is checked against the query 


(whose states are the reachable 


Pana (n) m (adv))Primin,max=? [ES target] (2) 


to find the bounds on the probability that the target is reached under a maximum 
number of geolocation tasks n. 


Experimental Results. Figure 10(a) shows the map setting used for imple- 
mentation. The UAV’s ability to actively detect an attack depends on both its 
belief and the ground truth. Specifically, the probability of success in a geolo- 
cation task mainly relies on the disparity between the belief and true locations, 
captured by fais: Ev (ag) x Ev (az) — [0,1], obtained by assigning probabili- 
ties for each pair of locations according to their features (e.g., landmarks) and 
smoothed using a Gaussian 2D filter. A thorough experimental analysis where 
probabilities are extracted from experiments with human operators is described 
in [11]. The set of hazard zones include the map boundaries to prevent the UAV 
from reaching boundary values. Also, the adversary is prohibited from launching 
attacks for at least the first step, a practical assumption to prevent the UAV 
model from infinitely bouncing around the target location. 

We implemented the model in PRISM-games [7,19] and performed the exper- 
iments on an Intel Core i7 4.0 GHz CPU, with 10 GB RAM dedicated to the tool. 
Figure 10(b) shows the supergame obtained by following the procedure in Algo- 
rithm 2. A vertex Gey represents a subgame (composed with its strategy) that 
starts at location (x, y), while the outgoing edges points to subgames reachable 
from the current one. Note that each edge represents a probabilistic transition. 
Subgames with more than one outgoing transition imply nondeterminism that 
is resolved by the adversary actions. Hence, the directed graph depicts an MDP. 

The synthesized strategy for (haay = 2, h = 4) is demonstrated in Fig. 10(c). 
For the initial subgame, Fig. 11(a) shows the maximum probability of a suc- 
cessful geolocation task if performed at stage h, and the remaining distance to 
target. Assuming the adversary can launch attacks after stage haav = 2, the 
detection probability is maximized by performing the geolocation task at step 4, 
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Fig. 10. (a) The environment setup used for the case study; (b) the induced supergame 
MDP, where the subgames form its states; and (c) the synthesized protocols. 


and hazard areas can still be avoided up till h = 6. For haqay = 1, however, 
h = 3 has the highest probability of success, which diminishes at h = 6 as 
no possible flight plan exists without encountering a hazard zone. The effect of 
the maximum number of geolocation tasks (n) on target reachability is studied 
by analyzing the supergame against dana as shown in Fig. 11(b). The minimum 
number of geolocation tasks to guarantee a non-zero probability of reaching the 
target (regardless of the adversary strategy) is 3 with probability bounds of 
(33.7%, 94.4%). 
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Fig. 11. Analysis results for (a) subgame Gs1 and (b) supergame Gp. 


The experimental data obtained for this case study are listed in Table 1. For 
the same grid size, more complex maps require more time for synthesis while the 
state space size remains unaffected. The state space grows exponentially with 
the explored horizon size, i.e., O ((|Auav||Aaav|)”), and is typically slowed by, 
e.g., the presence of hazard areas, since the branches of the game transitions 
are trimmed upon encountering such areas. Interestingly, for h = 6 and h = 7, 
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while the model construction time (size) for haav = 1 is almost twice (quadruple) 
as those for haav = 2, the time for checking @syn declines in comparison. This 
reflects the fact that, in case of haqay = 1 compared to haay = 2, the UAV has 
higher chances to reach a hazard zone for the same k, leading to a shorter time 
for model checking. 


Table 1. Results for strategy synthesis using queries syn and Qana- 


Subgame G51 | Model size Time (sec) 
Map | tad |k | States | Transitions | Choices | Model | @syn | dana 
8x 8/1 4 | 11,608 17,397 15,950} 2.810 | 0.072 | — 
5 | 57,129 87,865 83,267 | 14.729 | 0.602 | — 
6 | 236,714 366,749 | 359,234 | 62.582 | 1.293 | — 
7 | 876,550) 1,365,478 | 1,355,932 | 231.741 | 6.021 | — 
2 4 6,678 9,230 8,394) 2.381 | 0.042 | — 
5 | 33,904 48,545 45,354 |} 10.251 | 0.367 | — 
6 | 141,622 204,551] 198,640 | 37.192 | 1.839 | — 
7 | 524,942 763,144] 754,984 | 145.407 | 8.850 | — 
Supergame Gp 6,212 8,306 6,660 2.216 | — 2.490 


6 Discussion and Conclusion 


In this paper, we introduced DAGs and showed how they can simulate HIGs 
by delaying players’ actions. We also derived a DAG-based framework for strat- 
egy synthesis and analysis using off-the-shelf SMG model checkers. Under some 
practical assumptions, we showed that DAGs can be decomposed into indepen- 
dent subgames, utilizing parallel computation to reduce the time needed for 
model analysis, as well as the size of the state space. We further demonstrated 
the applicability of the proposed framework on a case study focused on synthe- 
sis and analysis of active attack detection strategies for UAVs prone to cyber 
attacks. 

DAGs come at the cost of increasing the total state space size as Mmra and 
Mmwr are introduced. This does not present a significant limitation due to the 
compositional approach towards strategy synthesis using subgames. However, 
the synthesis is still limited to model sizes that off-the-shelf tools can handle. 

The concept of delaying actions implicitly assumes that the adversary knows 
the UAV actions a priori. This does not present a concern in the presented 
case study as an abstract (i.e., nondeterministic) adversary model is analogous 
to synthesizing against the worst-case attacking scenario. Nevertheless, strate- 
gies synthesized using DAGs (and SMGs in general) are inherently conservative. 
Depending on the considered system, this can easily lead to no feasible solution. 
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The proposed synthesis framework ensures preservation of safety properties. 


Yet, general reward-based strategy synthesis is to be approached with care. For 
example, rewards dependent on the belief can appear in any state, and exploring 
hypothetical branches is not required. However, rewards dependent on a state’s 
true value should only appear in proper states, and all hypothetical branches are 
to be explored. A detailed investigation of how various properties are preserved 
by DAGs, along with multi-objective synthesis, is a direction for future work. 
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Abstract. We propose an automated verification technique for hyper- 
safety properties, which express sets of valid interrelations between mul- 
tiple finite runs of a program. The key observation is that constructing 
a proof for a small representative set of the runs of the product pro- 
gram (i.e. the product of the several copies of the program by itself), 
called a reduction, is sufficient to formally prove the hypersafety property 
about the program. We propose an algorithm based on a counterexample- 
guided refinement loop that simultaneously searches for a reduction and 
a proof of the correctness for the reduction. We demonstrate that our 
tool WEAVER is very effective in verifying a diverse array of hypersafety 
properties for a diverse class of input programs. 


1 Introduction 


A hypersafety property describes the set of valid interrelations between multiple 
finite runs of a program. A k-safety property [7] is a program safety property 
whose violation is witnessed by at least k finite runs of a program. Determinism 
is an example of such a property: non-determinism can only be witnessed by 
two runs of the program on the same input which produce two different outputs. 
This makes determinism an instance of a 2-safety property. 

The vast majority of existing program verification methodologies are geared 
towards verifying standard (1-)safety properties. This paper proposes an app- 
roach to automatically reduce verification of k-safety to verification of 1-safety, 
and hence a way to leverage existing safety verification techniques for hypersafety 
verification. The most straightforward way to do this is via self-composition [5], 
where verification is performed on k memory-disjoint copies of the program, 
sequentially composed one after another. Unfortunately, the proofs in these cases 
are often very verbose, since the full functionality of each copy has to be captured 
by the proof. Moreover, when it comes to automated verification, the invariants 
required to verify such programs are often well beyond the capabilities of modern 
solvers [26] even for very simple programs and properties. 

The more practical approach, which is typically used in manual or auto- 
mated proofs of such properties, is to compose k memory-disjoint copies of the 
program in parallel (instead of in sequence), and then verify some reduced pro- 
gram obtained by removing redundant traces from the program formed in the 
previous step. This parallel product program can have many such reductions. 
© The Author(s) 2019 
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For example, the program formed from sequential self-composition is one such 
reduction of the parallel product program. Therefore, care must be taken to 
choose a “good” reduction that admits a simple proof. Many existing approaches 
limit themselves to a narrow class of reductions, such as the one where each copy 
of the program executes in lockstep [3, 10,24], or define a general class of reduc- 
tions, but do not provide algorithms with guarantees of covering the entire class 
[4,24]. 

We propose a solution that combines the search for a safety proof with the 
search for an appropriate reduction, in a counterexample-based refinement loop. 
Instead of settling on a single reduction in advance, we try to verify the entire 
(possibly infinite) set of reductions simultaneously and terminate as soon as some 
reduction is successfully verified. If the proof is not currently strong enough to 
cover at least one of the represented program reductions, then an appropriate 
set of counterexamples are generated that guarantee progress towards a proof. 

Our solution is language-theoretic. We propose a way to represent sets of 
reductions using infinite tree automata. The standard safety proofs are also 
represented using the same automata, which have the desired closure properties. 
This allows us to check if a candidate proof is in fact a proof for one of the 
represented program reductions, with reasonable efficiency. 

Our approach is not uniquely applicable to hypersafety properties of sequen- 
tial programs. Our proposed set of reductions naturally work well for concurrent 
programs, and can be viewed in the spirit of reduction-based methods such 
as those proposed in [11,21]. This makes our approach particularly appealing 
when it comes to verification of hypersafety properties of concurrent programs, 
for example, proving that a concurrent program is deterministic. The parallel 
composition for hypersafety verification mentioned above and the parallel com- 
position of threads inside the multi-threaded program are treated in a uniform 
way by our proof construction and checking algorithms. In summary: 


— We present a counterexample-guided refinement loop that simultaneously 
searches for a proof and a program reduction in Sect. 7. This refinement loop 
relies on an efficient algorithm for proof checking based on the antichain 
method of [8], and strong theoretical progress guarantees. 

— We propose an automata-based approach to representing a class of program 
reductions for k-safety verification. In Sect. 5 we describe the precise class of 
automata we use and show how their use leads to an effective proof checking 
algorithm incorporated in our refinement loop. 

— We demonstrate the efficacy of our approach in proving hypersafety properties 
of sequential and concurrent benchmarks in Sect. 8. 


2 Illustrative Example 


We use a simple program MULT, that computes the product of two non-negative 
integers, to illustrate the challenges of verifying hypersafety properties and the 
type of proof that our approach targets. Consider the multiplication program in 
Fig. 1(i), and assume we want to prove that it is distributive over addition. 
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Mult: Copy 1: Copy 2: Copy 3: 
A: i0 b: i0 A: i2 — 0 fa: iz —0 
bg: xr<O0 fo: r+ 0 bo: tr <0 fla: t3 t= 0 
t: | whilei<a és: | whilei;<a+b ¢: | while ig<a és: | while iz <b 
La: oeutdb::&: wate La: tae tote ba: z3 — T3 +C 
bs: icitl ls: i i +1 ls: ig iz+1 č b: ig Hig +1 


bg: (i): | 2e: ls: ls: Gi) 


Fig. 1. Program MULT (i) and the parallel composition of three copies of it (ii). 


In Fig. 1 (ii), the parallel composition of MULT with two copies of itself is illus- 
trated. The product program is formed for the purpose of proving distributivity, 
which can be encoded through the postcondition x; = r2 + x3. Since a, b, and 
c are not modified in the program, the same variables are used across all copies. 
One way to prove MULT is distributive is to come up with an inductive invariant 
Qijk for each location in the product program, represented by a triple of program 
locations (£:, £5, &k); such that true = > $11; and ¢@g66 => zı = T2 + 23. The 
main difficulty lies in finding assignments for locations such as 611 that are 
points in the execution of the program where one thread has finished executing 
and the next one is starting. For example, at (l6, 41,41) we need the assignment 
dou <— zı = (a + b) x c which is non-linear. However, the program given in 
Fig. 1(ii) can be verified with simpler (linear) reasoning. 

The program on the right is a semantically 
equivalent reduction of the full composition of 
Fig. 1(ii). Consider the program P = (Copy 1 || 


i —0, in — 0, ig <0 
: xı = 0, z2 + 0,273 + 0: 
>: while ig <a 


(Copy 2; Copy 3)). The program on the right is; aes 
equivalent to a lockstep execution of the two par- : te <a. +e 
allel components of P. The validity of this reduc- : i i, +1 
tion is derived from the fact that the statements : ig — i2 +1 
in each thread are independent of the statements : while i3 < b 
in the other. That is, reordering the statements of | Tı <2 +e 
different threads in an execution leads to an equiva- T3 Baer © 
: S ‘ i i1 +1 
lent execution. It is easy to see that 7, = £2 + £3 is é f 
13 — 13 + 1 


an invariant of both while loops in the reduced pro- 
gram, and therefore, linear reasoning is sufficient to 
prove the postcondition for this program. Conceptually, this reduction (and its 
soundness proof) together with the proof of correctness for the reduced program 
constitute a proof that the original program MULT is distributive. Our proposed 
approach can come up with reductions like this and their corresponding proofs 
fully automatically. Note that a lockstep reduction of the program in Fig. 1(ii) 
would not yield a solution for this problem and therefore the discovery of the 
right reduction is an integral part of the solution. 
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3 Programs and Proofs 


A non-deterministic finite automaton (NFA) is a tuple A = (Q, X, ô, qo, F) where 
Q is a finite set of states, X is a finite alphabet, ô C Q x X x Q is the transition 
relation, go € Q is the initial state, and F C Q is the set of final states. A 
deterministic finite automaton (DFA) is an NFA whose transition relation is a 
function ô : Q x X — Q. The language of an NFA or DFA A is denoted L(A), 
which is defined in the standard way [18]. 


3.1 Program Traces 


St denotes the (possibly infinite) set of program states. For example, a program 
with two integer variables has St = Z x Z. A C St is a (possibly infinite) 
set of assertions on program states. X denotes a finite alphabet of program 
statements. We refer to a finite string of statements as a (program) trace. For 
each statement a € X we associate a semantics [a] C St x St and extend [-] 
to traces via (relation) composition. A trace x € X* is said to be infeasible if 
[x] (St) = 0, where [z](St) denotes the image of [x] under St. To abstract away 
from a particular program syntax, we define a program as a regular language of 
traces. The semantics of a program P is simply the union of the semantics of 
its traces [|P] = U, pl]. Concretely, one may obtain programs as languages 
by interpreting their edge-labelled control-flow graphs as DFAs: each vertex in 
the control flow graph is a state, and each edge in the control flow graph is a 
transition. The control flow graph entry location is the initial state of the DFA 
and all its exit locations are final states. 


3.2 Safety 


There are many equivalent notions of program safety; we use non-reachability. 
A program P is safe if all traces of P are infeasible, i.e. [P](St) = 0. Standard 
partial correctness specifications are then represented via a simple encoding. 
Given a precondition ¢ and a postcondition w, the validity of the Hoare-triple 
{9} P{4} is equivalent to the safety of [¢]-P- [=], where [] is a standard assume 
statement (or the singleton set containing it), and - is language concatenation. 


Example 3.1. We use determinism as an example of how k-safety can be encoded 
in the framework defined thus far. If P is a program then determinism of P is 
equivalent to safety of [¢] - (Pı Wt P2) - [A¢] where P) and P are copies of P 
operating on disjoint variables, LU is a shuffle product of two languages, and [4] 
is an assume statement asserting that the variables in each copy of P are equal. 


A proof is a finite set of assertions M C A that includes true and false. Each 
IT gives rise to an NFA Hyra = (H, St, 67, true, {false}) where ôr (pre, a) = 
{dpost | [a] (dpre) E post}. We abbreviate £L (Myra) as L(I). Intuitively, L(I) 
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consists of all traces that can be proven infeasible using only assertions in H. 
Thus the following proof rule is sound [12,13,17]: 


JH C A.P C L(I) 
P is safe 


(SAFE) 


When P C L(IT), we say that I is a proof for P. A proof does not uniquely 
belong to any particular program; a single II may prove many programs correct. 


4 Reductions 


The set of assertions used for a proof is usually determined by a particular 
language of assertions, and a safe program may not have a (safety) proof in that 
particular language. Yet, a subset of the program traces may have a proof in 
that assertion language. If it can be proven that the subset of program runs that 
have a safety proof are a faithful representation of all program behaviours (with 
respect to a given property), then the program is correct. This motivates the 
notion of program reductions. 


Definition 4.1 (semantic reduction). If for programs P and P', P’ is safe 
implies that P is safe, then P’ is a semantic reduction of P (written P' < P). 


The definition immediately gives rise to the following proof rule for proving 
program safety: 


JP' < P, IH C A. P' C L(IT) 


: (SAFERED1) 
P is safe 


This generic proof rule is not automatable since, given a proof I, verifying 
the existence of the appropriate reduction is undecidable. Observe that a program 
is safe if and only if @ is a valid reduction of the program. This means that 
discovering a semantic reduction and proving safety are mutually reducible to 
each other. To have decidable premises for the proof rule, we need to formulate 
an easier (than proving safety) problem in discovering a reduction. One way to 
achieve this is by restricting the set of reductions under consideration from all 
reductions (given in Definition 4.1) to a proper subset which more amenable to 
algorithmic checking. Fixing a set R of (semantic) reductions, we will have the 
rule: 


SPER. P' C L(II) YP'E R.P! < P 
P is safe 


(SAFERED2) 


Proposition 4.2. The proof rule SAFERED2 is sound. 
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The core contribution of this paper is that it provides an algorithmic solution 
inspired by the above proof rule. To achieve this, two subproblems are solved: 
(1) Given a set R of reductions of a program P and a candidate proof I, can 
we check if there exists a reduction P’ € R which is covered by the proof H? In 
Sect. 5, we propose a new semantic interpretation of an existing notion of infinite 
tree automata that gives rise to an algorithmic check for this step. (2) Given a 
program P, is there a general sound set of reductions R that be effectively 
represented to accommodate step (1)? In Sect. 6, we propose a construction of 
an effective set of reductions, representable by our infinite tree automata, using 
inspirations from existing partial order reduction techniques [15]. 


5 Proof Checking 


Given a set of reductions R of a program P, and a candidate proof I, we want 
to check if there exists a reduction P’ € R which is covered by JT. We call this 
proof checking. We use tree automata to represent certain classes of languages 
(i.e sets of sets of strings), and then use operations on these automata for the 
purpose of proof checking. 

The set X* can be represented as an infinite tree. 
Each x € X* defines a path to a unique node in the 
tree: the root node is located at the empty string e, 
and for all a € X, the node located at xa is a child 
of the node located at x. Each node is then iden- 
tified by the string labeling the path leading to it. 
A language L C X* (equivalently, L : ©* — B) 
can consequently be represented as an infinite tree 
where the node at each z is labelled with a boolean 
value B = (x € L). An example is given in Fig. 2. 

It follows that a set of languages is a set of infi- 
nite trees, which can be represented using automata 
over infinite trees. Looping Tree Automata (LTAs) 
are a subclass of Biichi Tree Automata where all states are accept states [2]. 
The class of Looping Tree Automata is closed under intersection and union, and 
checking emptiness of LTAs is decidable. Unlike Biichi Tree Automata, emptiness 
can be decided in linear time [2]. 


Fig. 2. Language {a} as an 
infinite tree. 


Definition 5.1. A Looping Tree Automaton (LTA) over |X|-ary, B-labelled 
trees is a tuple M = (Q, A, qo) where Q is a finite set of states, A C QxBx(X' > 
Q) is the transition relation, and qo is the initial state. 


Intuitively, an LTA M = (Q, A, qo) performs a parallel and depth-first traversal 
of an infinite tree L while maintaining some local state. Execution begins at the 
root € from state go and non-deterministically picks a transition (qo, B,a) € A 
such that B matches the label at the root of the tree (i.e. B = (e € L)). If no 
such transition exists, the tree is rejected. Otherwise, M recursively works on 
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each child a from state q’ = o(a) in parallel. This process continues infinitely, 
and L is accepted if and only if L is never rejected. 

Formally, M’s execution over a tree L is characterized by a run 6* : 
X* — Q where d*(€) = qo and (d*(x),a € L,ra.d*(xa)) € A for all 
x € X*. The set of languages accepted by M is then defined as £(M) = {L | 
4o*.d* is a run of M on L}. 


Theorem 5.2. Given an LTA M and a regular language L, it is decidable 
whether IP € L(M).PC L. 


The proof, which appears in [14], reduces the problem to deciding whether 
L(M)AP(L) 4 ý. LTAs are closed under intersection and have decidable empti- 
ness checks, and the lemma below is the last piece of the puzzle. 


Lemma 5.3. If L is a regular language, then P(L) is recognized by an LTA. 


Counterexamples. Theorem 5.2 effectively states that proof checking is decid- 
able. For automated verification, beyond checking the validity of a proof, we 
require counterexamples to fuel the development of the proof when the proof does 
not check. Note that in the simple case of the proof rule SAFE, when P Z L(I) 
there exists a counterexample trace x € P such that x ¢ £L (II). 

With our proof rule SAFERED2, things get a bit more complicated. First, 
note that unlike the classic case (SAFE), where a failed proof check coincides 
with the non-emptiness of an intersection check (i.e. PM L(I) Æ 0), in our 
case, a failed proof check coincides with the emptiness of an intersection check 
(i.e. RAO P(L(ID)) = Ø). The sets R and P(L(I)) are both sets of languages. 
What does the witness to the emptiness of the intersection look like? Each 
language member of R contains at least one string that does not belong to any 
of the subsets of our proof language. One can collect all such witness strings to 
guarantee progress across the board in the next round. However, since LTAs can 
represent an infinite set of languages, one must take care not end up with an 
infinite set of counterexamples following this strategy. Fortunately, this will not 
be the case. 


Theorem 5.4. Let M be an LTA and let L be a regular language such that 
P ¢ L for all P € L(M). There exists a finite set of counterexamples C such 
that, for all P € L(M), there exists some x € C such that x E€ P anda ¢ L. 


The proof appears in [14]. This theorem justifies our choice of using LTAs instead 
of more expressive formalisms such as Biichi Tree Automata. For example, the 
Büchi Tree Automaton that accepts the language {{x} | £ € X*} would give rise 
to an infinite number of counterexamples with respect to the empty proof (i.e. 
II = 0). The finiteness of the counterexample set presents an alternate proof 
that LTAs are strictly less expressive than Biichi Tree Automata [27]. 
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6 Sleep Set Reductions 


We have established so far that (1) a set of assertions gives rise to a regular lan- 
guage proof, and (2) given a regular language proof and a set of program reduc- 
tions recognizable by an LTA, we can check the program (reductions) against 
the proof. The last piece of the puzzle is to show that a useful class of program 
reductions can be expressed using LTAs. 

Recall our example from Sect.2. The reduction we obtain is sound because, 
for every trace in the full parallel-composition program, an equivalent trace exists 
in the reduced program. By equivalent, we mean that one trace can be obtained 
from the other by swapping independent statements. Such an equivalence is the 
essence of the theory of Mazurkiewicz traces [9]. 

We fix a reflexive symmetric dependence relation D C Xx X. For all a,b € X, 
we say that a and b are dependent if (a,b) € D, and say they are independent 
otherwise. We define ~p as the smallest congruence satisfying saby ~p xbay 
for all x,y E€ X* and independent a,b € X. The closure of a language L C X* 
with respect to ~p is denoted [L]p. A language L is ~p-closed if L = [L] p. It is 
worthwhile to note that all input programs considered in this paper correspond 
to regular languages that are ~ p-closed. 

An equivalence class of ~p is typically called a (Mazurkiewicz) trace. We 
avoid using this terminology as it conflicts with our definition of traces as strings 
of statements in Sect.3.1. We assume D is sound, i.e. [ab] = [ba] for all inde- 
pendent a,b € X. 


Definition 6.1 (D-reduction). A program P’ is a D-reduction of a program 
P, that is P' =D P, if |[P']D =P. 


Note that the equivalence relation on programs induced by ~p is a refinement 
of the semantic equivalence relation used in Definition 4.1. 


Lemma 6.2. If P! Xp P then P’ <P. 


Ideally, we would like to define an LTA that accepts all D-reductions of a 
program P, but unfortunately this is not possible in general. 


Proposition 6.3 (corollary of Theorem 67 of [9]). For arbitrary regular 
languages Lı, Lo E X* and relation D, the proposition 3L Xp Lı.L C Lə is 
undecidable. 


The proposition is decidable only when D is transitive, which does not hold for 
a semantically correct notion of independence for a parallel program encoding 
a k-safety property, since statements from the same thread are dependent and 
statements from different program copies are independent. Therefore, we have: 


Proposition 6.4. Assume P is a ~p-closed program and I is a proof. The 
proposition 3P' <p P. P' C L(IT) is undecidable. 
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In order to have a decidable premise for proof rule SAFERED2 then, we 
present an approximation of the set of D-reductions, inspired by sleep sets [15]. 
The idea is to construct an LTA that recognizes a class of D-reductions of an 
input program P, whose language is assumed to be ~p-closed. This automaton 
intuitively makes non-deterministic choices about what program traces to prune 
in favour of other ~p-equivalent program traces for a given reduction. Different 
non-deterministic choices lead to different D-reductions. 

Consider two statements a,b € X where (a,b) ¢ 
D. Let x,y € X* and consider two program runs xaby 
and «bay. We know [xbay] = [aby]. If the automa- 
ton makes a non-deterministic choice that the suc- 
cessors of xa have been explored, then the successors | xq xh 
of xba need not be explored (can be pruned away) (alse) (alse) 
as illustrated in Fig.3. Now assume (a,c) € D, for 
some c € X. When the node zbc is being explored, 
we can no longer safely ignore a-transitions, since the 
equality [«bcay] = [xabcy] is not guaranteed. There- 
fore, the a successor of xbc has to be explored. The 
nondeterministic choice of what child node to explore 
is modelled by a choice of order in which we explore 
each node’s children. Different orders yield different 
reductions. Reductions are therefore characterized as Fig. 3. Exploring from x 
an assignment R: X* — Lin(S’) from nodes to lin- with sleep sets. 
ear orderings on X, where (a,b) € R(x) means we 
explore child xa after child xb. 

Given R: X* — Lin(S’), the sleep set sleepp(x) C X at node x € X* defines 
the set of transitions that can be ignored at zx: 


sleep p(e) = 0 (1) 
sleep p(ra) = (sleep g(x) U R(x) (a)) \ D(a) (2) 


Intuitively, (1) no transition can be ignored at the root node, since nothing has 
been explored yet, and (2) at node x, the sleep set of xa is obtained by adding 
the transitions we explored before a (R(x)(a)) and then removing the ones that 
conflict with a (i.e. are related to a by D). Next, we define the nodes that are 
ignored. The set of ignored nodes is the smallest set ignorep : X* — B such that 


x €ignorery => «a € ignorep (1) 
a €sleepp(z) = > zra € ignorep (2) 


Intuitively, a node za is ignored if (1) any of its ancestors is ignored (ignore p(x)), 
or (2) a is one of the ignored transitions at node x (a € sleepp(x)). 

Finally, we obtain an actual reduction of a program P from a characterization 
of a reduction R by removing the ignored nodes from P, i.e. P \ ignorep. 


Lemma 6.5. For all R : X* — Lin(), if P is a ~p-closed program then 
P \ ignorep is a D-reduction of P. 
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The set of all such reductions is reducep(P) = {P\ignorep | R: L* > Lin()}. 
Theorem 6.6. For any regular language P, reducep(P) is accepted by an LTA. 


Interestingly, every reduction in reducep(P) is optimal in the sense that each 
reduction contains at most one representative of each equivalence class of ~p. 


Theorem 6.7. Fix some P C X* and R : &* — Lin(X). For all (x,y) € 
P \ ignorepg, if x ~p y then x = y. 


7 Algorithms 


Figure 4 illustrates 


th tli f Program P- Program P is incorrect 
e outline of our Dependence Relation D + a valid counterexample 

verification algo- Initial empty proof II 

rithm. It is a 

counterexample- 


guided abstraction 
refinement loop in 
the style of [12, 
13,17]. The key 
difference is that 


instead of check- Let TI = MUT Construct a proof IT’ for 
ing whether some Program P is verified invalidity of everything in C 
proof HM is a 


proof for the pro- 

gram P, it checks Fig. 4. Counterexample-guided refinement loop. 
if there exists a 

reduction of the program P that IT proves correct. 

The algorithm relies on an oracle INTERPOLATE that, given a finite set of 
program traces C, returns a proof II’, if one exists, such that C C L(I). In 
our tool, we use Craig interpolation to implement the oracle INTERPOLATE. In 
general, since program traces are the simplest form of sequential programs (loop 
and branch free), any automated program prover that can handle proving them 
may be used. 

The results presented in Sects.5 and 6 give rise to the proof checking sub 
routine of the algorithm in Fig.4 (i.e. the light grey test). Given a program 
DFA Ap = (QP, PIA Op, dP0; Fp) and a proof DFA An = (Qn, dy, Om, qro, Fir) 
(obtained by determinizing Hypa), we can decide SP’ € reducep(L(Ap)). P’ C 
L(A) by constructing an LTA Mpy for reducep(£L(Ap)) OP(L(Ar)) and 
checking emptiness (Theorem 5.2). 


Does there exist 
a reduction of P that 
is covered by II? 


Is any member of C 
a valid counterexample? 


C: a set of 
counterexamples 


7.1 Progress 


The algorithm corresponding to Fig. 4 satisfies a weak progress theorem: none 
of the counterexamples from a round of the algorithm will ever appear in a 
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future counterexample set. This, however, is not strong enough to guarantee 
termination. Alternatively, one can think of the algorithm’s progress as follows. 
In each round new assertions are discovered through the oracle INTERPOLATE, 
and one can optimistically hope that one can finally converge on an existing 
target proof JI*. The success of this algorithm depends on two factors: (1) the 
counterexamples used by the algorithm belong to £L(I*) and (2) the proof that 
INTERPOLATE discovers for these counterexamples coincide with J/*. The latter 
is a typical known wild card in software model checking, which cannot be guar- 
anteed; there is plenty of empirical evidence, however, that procedures based on 
Craig Interpolation do well in approximating it. The former is a new problem 
for our refinement loop. 

In a standard algorithm in the style of [12,13,17], the verification proof rule 
dictates that every program trace must be in £(/7*). In our setting, we only 
require a subset (corresponding to some reduction) to be in £(J7*). This means 
one cannot simply rely on program traces as appropriate counterexamples. The- 
orem 5.4 presents a solution to this problem. It ensures that we always feed 
INTERPOLATE some counterexample from J* and therefore guarantee progress. 


Theorem 7.1 (Strong Progress). Assume a proof II* exists for some reduc- 
tion P* € R and INTERPOLATE always returns some subset of II* for traces in 
L(II*). Then the algorithm will terminate in at most |II*| iterations. 


Theorem 7.1 ensures that the algorithm will never get into an infinite loop 
due to a bad choice of counterexamples. The condition on INTERPOLATE ensures 
that divergence does not occur due to the wrong choice of assertions by INTERPO- 
LATE and without it any standard interpolation-based software model checking 
algorithm may diverge. The assumption that there exists a proof for a reduction 
of the program in the fixed set R ensures that the proof checking procedure can 
verify the target proof JI* once it is reached. Note that, in general, a proof may 
exist for a reduction of the program which is not in R. Therefore, the algorithm 
is not complete with respect to all reductions, since checking the premises of 
SAFERED1 is undecidable as discussed in Sect. 4. 


7.2 Faster Proof Checking Through Antichains 


The state set of Mp, the intersection of program and proof LTAs, has size 
\Qp x B x P(X) x Qr], which is exponential in |X|. Therefore, even a linear 
emptiness test for this LTA can be computationally expensive. Antichains have 
been previously used [8] to optimize certain operations over NFAs that also suffer 
from exponential blowups, such as deciding universality and inclusion tests. The 
main idea is that these operations involve computing downwards-closed and 
upwards-closed sets according to an appropriate subsumption relation, which 
can be represented compactly as antichains. We employ similar techniques to 
propose a new emptiness check algorithm. 


Antichains. The set of maximal elements of a set X with respect to some 
ordering relation E is denoted max(X). The downwards-closure of a set X with 
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respect to E is denoted |X|. An antichain is a set X where no element of X is 
related (by E) to another. The maximal elements max(X) of a finite set X is an 
antichain. If X is downwards-closed then |max(X)| = X. 

The emptiness check algorithm for LTAs from [2] computes the set of inactive 
states (i.e. states which generate an empty language) and checks if the initial 
state is inactive. The set of inactive states of an LTA M = (Q, A, qo) is defined 
as the smallest set inactive( M) satisfying 


V(q, B,o) € A. Ja. o(a) € inactive(M) 
q € inactive(M) 


(INACTIVE) 


Alternatively, one can view inactive( M) as the least fixed-point of a monotone 
(with respect to C) function Fm : P(Q) > P(Q) where 


F(X) = {q | V(q, B, o) € A. Ja.o(a) € X}. 


Therefore, inactive( M) can be computed using a standard fixpoint algorithm. 

If inactive( M) is downwards-closed with respect to some subsumption relation 
(C) C Q x Q, then we need not represent all of inactive(M). The antichain 
max(inactive(M/)) of maximal elements of inactive( M) (with respect to E) would 
be sufficient to represent the entirety of inactive( M), and can be exponentially 
smaller than inactive( M), depending on the choice of relation E. 

A trivial way to compute max(inactive(M)) is to first compute inactive( M) 
and then find the maximal elements of the result, but this involves doing strictly 
more work than the baseline algorithm. However, observe that if Fm also pre- 
serves downwards-closedness with respect to E, then 


max(inactive(/)) = max(lfp(Fm)) 
= max(lfp(Fm o |-| o max)) = Ifp(max oF yy o |—|) 


That is, max(inactive(M)) is the least fixed-point of a function Fẹ” : 
P(Q) — P(Q) defined as FRP*(X) = max(Fy(|X])). We can calculate 
max(inactive(M)) efficiently if we can calculate Fy/°*(X) efficiently, which is 
true in the special case of the intersection automaton for the languages of our 
proof P(L(II)) and our program reducep(P), which we refer to as Mpr. 

We are most interested in the state space of Mpr, which is Qpr = (Qp x 
B x P(X)) x Qr. Observe that states whose B part is T are always active: 


Lemma 7.2. ((qp, T,S),qm) ¢ inactive(Mpr) for all gp € Qp, qu € Qn, and 
SCA. 


The state space can then be assumed to be Qpr = (Qp x {L} x P(X)) x Qr 
for the purposes of checking inactivity. The subsumption relation defined as the 
smallest relation Epy satisfying 


SCS" => ((qP, L, S), qr) Crm ((ap, L, 8’), am) 


for all gp € QP, qr € Qn, and S,S’ C X, is a suitable one since: 
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Lemma 7.3. FMpnu preserves downwards-closedness with respect to Cpr. 


The function Fy", is a function over relations 
FM? P((Qp x {L} x P(2)) x Qr) > P((Qp x {1} x P(2)) x Qr) 
but in our case it is more convenient to view it as a function over functions 
FMpn ` (QP x {1} x Qa > P(P(%))) > (Qpr x {1} x Qa > P(P(2))) 


Through some algebraic manipulation and some simple observations, we can 
define Fy7>*, functionally as follows. 


Lemma 7.4. For all qp € QP, qn E€ Qn, and X:Qpx{lL}x Qu > 
P(P(S)), 


{2} if qp € Fp Aqn ¢ Fr 
Fue, (Xap, 1, an) = 0 U S! otherwise 
RELin(Z) ae 


SEX (qp,L,ai) 


where 


dp = Op(qp, a) XNY=max{rny|creXAyEY} 
dn = On(an, 4) XUY = max(X UY) 


he d D(a))\ {a}} if Ra) \ Dla) E S 


f) otherwise 


function Check(Ap, Ay, D) 
(Qp, X, ôp, qop, Fp) — Ap 
(Qn, X, ôn, qon, Fr) An 
function FMax(X) ((qp, L,qr)) 
if qp € Fp ^qu ¢ Fn 
return {X} 
X'e {5} 
for R € Lin(X) 
X" -p 
fora € X, S € X((dp(qp,a), L, 6m(qm, a))) 
if R(a) \ D(a) CS 
X" — X" U{(SU D(a)) \ {a}} 
XX" — XN XY 
return X" 
return Fix(FMax) ((gor, L,qom)) #9 


Algorithm 1. Proof checking algorithm 
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A full justification appears in [14]. Formulating Fq°*~ as a higher-order func- 
tion allows us to calculate max(inactive(Mpr)) using efficient fixpoint algo- 
rithms like the one in [22]. Algorithm 1 outlines our proof checking routine. 
Fix : ((A > B) (A > B)) (A — B) is a procedure that computes the 
least fixpoint of its input. The algorithm simply computes the fixpoint of the 
function FMg, as defined in Lemma 7.4, which is a compact representation of 
inactive( Mpz) and checks if the start state of Mp, is in it. 


Counterezamples. Theorem 5.4 states that a finite set of counterexamples 
exists whenever 3P’ € reducep(P).P’ C L(I) does not hold. The proof of 
emptiness for an LTA, formed using rule INACTIVE above, is a finite tree. Each 
edge in the tree is labelled by an element of X (obtained from the existential 
in the rule) and the paths through this tree form the counterexample set. To 
compute this set, then, it suffices to remember enough information during the 
computation of inactive( M) to reconstruct the proof tree. Every time a state q 
is determined to be inactive, we must also record the witness a € X for each 
transition (q,B,a) € A such that o(a) € inactive( M). 

In an antichain-based algorithm, once we determine a state q to be inactive, 
we simultaneously determine everything it subsumes (i.e. E q) to be inactive as 
well. If we record unique witnesses for each and every state that q subsumes, 
then the space complexity of our antichain algorithm will be the same as the 
unoptimized version. The following lemma states that it is sufficient to record 
witnesses only for q and discard witnesses for states that q subsumes. 


Lemma 7.5. Fiz some states q,q’ such that q' Epr q. A witness used to prove 
q is inactive can also be used to prove q' is inactive. 


Note that this means that the antichain algorithm soundly returns potentially 
fewer counterexamples than the original one. 


7.3 Partition Optimization 


The LTA construction for reducep(P) involves a nondeterministic choice of lin- 
ear order at each state. Since |Lin( X)| has size |X]|!, each state in the automa- 
ton would have a large number of transitions. As an optimization, our algo- 
rithm selects ordering relations out of Part(’) (instead of Lin(S’)), defined as 
Part( X) = {51 x X | X1 W X = X} where W is disjoint union. This leads to a 
sound algorithm which is not complete with respect to sleep set reductions and 
trades the factorial complexity of computing Lin(X) for an exponential one. 


8 Experimental Results 


To evaluate our approach, we have implemented our algorithm in a tool called 
WEAVER written in Haskell. WEAVER accepts a program written in a simple 
imperative language as input, where the property is already encoded in the 
program in the form of assume statements, and attempts to prove the program 
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correct. The dependence relation for each input program is computed using a 
heuristic that ensures ~p-closedness. It is based on the fact that the shuffle 
product (i.e. parallel composition) of two ~p-closed languages is ~ p-closed. 

WEAVER employs two verification algorithms: (1) The total order algorithm 
presented in Algorithm 1, and (2) the variation with the partition optimization 
discussed in Sect. 7.3. It also implements multiple counterexample generation 
algorithms: (1) Naive: selects the first counterexample in the difference of the 
program and proof language. (2) Progress-Ensuring: selects a set of counterex- 
amples satisfying Theorem 5.4. (3) Bounded Progress-Ensuring: selects a few 
counterexamples (in most cases just one) from the set computed by the progress- 
ensuring algorithm. Our experimentation demonstrated that in the vast majority 
of the cases, the bounded progress ensuring algorithm (an instance of the par- 
tition algorithm) is the fastest of all options. Therefore, all our reports in this 
section are using this instance of the algorithm. 

For the larger benchmarks, we use a simple sound optimization to reduce 
the proof size. We declare the basic blocks of code as atomic, so that internal 
assertions need not be generated for them as part of the proof. This optimization 
is incomplete with respect to sleep set reductions. 


Benchmarks. We use a set of sequential benchmarks from [24] and include 
additional sequential benchmarks that involve more interesting reductions in 
their proofs. We have a set of parallel benchmarks, which are beyond the scope 
of previous hypersafety verification techniques. We use these benchmarks to 
demonstrate that our technique/tool can seamlessly handle concurrency. These 
involve proving concurrency specific hypersafety properties such as determinism 
and equivalence of parallel and sequential implementations of algorithms. Finally, 
since the proof checking algorithm is the core contribution of this paper, we have 
a contrived set of instances to stress test our algorithm. These involve proving 
determinism of simple parallel-disjoint programs with various numbers of threads 
and statements per thread. These benchmarks have been designed to cause a 
combinatorial explosion for the proof checker and counterexample generation 
routines. More information on the benchmarks can be found in [14]. 


Evaluation 


Due to space restrictions, it is not feasible to include a detailed account of all 
our experiments here, for over 50 benchmarks. A detailed table can be found in 
[14]. Table 1 includes a summary in the form of averages, and here, we discuss 
our top findings. 

Proof construction time refers to the time spent to construct £(J7) from 
a given set of assertions JI and excludes the time to produce proofs for the 
counterexamples in a given round. Proof checking time is the time spent 
to check if the current proof candidate is strong enough for a reduction of the 
program. In the fastest instances (total time around 0.01s), roughly equal time 
is spent in proof checking and proof construction. In the slowest instances, the 
total time is almost entirely spent in proof construction. In contrast, in our stress 
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Table 1. Experimental results averages for benchmark groups. 


Benchmark group Group Proof size |Number of |Proof Proof Total 

count refinement (construction |checking |time 
rounds time time 

Looping programs of [24] 5 63 12 46.69 s 0.1s 47.038 

2-safety properties 

Looping programs of [24] 8 155 22 475.788 11.79s 448.36 s 

3-safety properties 

Loop-free programs of [24] |27 5 2 0.13s 0.0004s /|0.15s 

Our sequential benchmarks |13 30 9 14.27 s 2.5s 17.94s 

Our parallel benchmarks T 31 8 17.95 0.56s 18.63 s 


tests (designed to stress the proof checking algorithm) the majority of the time 
is spent in proof checking. The time spent in proving counterexamples correct 
is negligible in all instances. Proof sizes vary from 4 assertions to 298 for the 
most complicated instance. Verification times are correlated with the final proof 
size; larger proofs tend to cause longer verification times. 

Numbers of refinement rounds vary from 2 for the simplest to 33 for the 
most complicated instance. A small number of refinement rounds (e.g. 2) implies 
a fast verification time. But, for the higher number of rounds, a strong positive 
correlation between the number of rounds and verification time does not exist. 

For our parallel programs benchmarks (other than our stress tests), the 
tool spends the majority of its time in proof construction. Therefore, we designed 
specific (unusual) parallel programs to stress test the proof checker. Stress test 
benchmarks are trivial tests of determinism of disjoint parallel programs, which 
can be proven correct easily by using the atomic block optimization. However, 
we force the tool to do the unnecessary hard work. These instances simulate the 
worst case theoretical complexity where the proof checking time and number of 
counterexamples grow exponentially with the number of threads and the sizes of 
the threads. In the largest instance, more than 99% of the total verification time 
is spent in proof checking. Averages are not very informative for these instances, 
and therefore are not included in Table 1. 

Finally, WEAVER is only slow for verifying 3-safety properties of large loop- 
ing benchmarks from [24]. Note that unlike the approach in [24], which starts 
from a default lockstep reduction (that is incidentally sufficient to prove these 
instances), we do not assume any reduction and consider them all. The extra 
time is therefore expected when the product programs become quite large. 


9 Related Work 


The notion of a k-safety hyperproperty was introduced in [7] without consider- 
ation for automatic program verification. The approach of reducing k-safety to 
l-safety by self-composition is introduced in [5]. While theoretically complete, 
self-composition is not practical as discussed in Sect. 1. Product programs gener- 
alize the self-composition approach and have been used in verifying translation 
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validation [20], non-interference [16,23], and program optimization [25]. A prod- 
uct of two programs P, and P> is semantically equivalent to P, - Pz (sequential 
composition), but is made easier to verify by allowing parts of each program to 
be interleaved. The product programs proposed in [3] allow lockstep interleav- 
ing exclusively, but only when the control structures of P) and Pz match. This 
restriction is lifted in [4] to allow some non-lockstep interleavings. However, the 
given construction rules are non-deterministic, and the choice of product pro- 
gram is left to the user or a heuristic. 

Relational program logics [6,28] extend traditional program logics to allow 
reasoning about relational program properties, however automation is usually 
not addressed. Automatic construction of product programs is discussed in [10] 
with the goal of supporting procedure specifications and modular reasoning, 
but is also restricted to lockstep interleavings. Our approach does not support 
procedure calls but is fully automated and permits non-lockstep interleavings. 

The key feature of our approached is the automation of the discovery of 
an appropriate program reduction and a proof combined. In this case, the only 
other method that compares is the one based on Cartesian Hoare Logic (CHL) 
proposed in [24] along with an algorithm for automatic verification based on 
CHL. Their proposed algorithm implicitly constructs a product program, using 
a heuristic that favours lockstep executions as much as possible, and then priori- 
tizes certain rules of the logic over the rest. The heuristic nature of the search for 
the proof means that no characterization of the search space can be given, and 
no guarantees about whether an appropriate product program will be found. In 
contrast, we have a formal characterization of the set of explored product pro- 
grams in this paper. Moreover, CHL was not designed to deal with concurrency. 

Lipton [19] first proposed reduction as a way to simplify reasoning about 
concurrent programs. His ideas have been employed in a semi-automatic set- 
ting in [11]. Partial-order reduction (POR) is a class of techniques that reduces 
the state space of search by removing redundant paths. POR techniques are 
concerned with finding a single (preferably minimal) reduction of the input pro- 
gram. In contrast, we use the same underlying ideas to explore many program 
reductions simultaneously. The class of reductions described in Sect. 6 is based 
on the sleep set technique of Godefroid [15]. Other techniques exist [1,15] that 
are used in conjunction with sleep sets to achieve minimality in a normal POR 
setting. In our setting, reductions generated by sleep sets are already optimal 
(Theorem 6.7). However, employing these additional POR techniques may pro- 
pose ways of optimizing our proof checking algorithm by producing a smaller 
reduction LTA. 
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Abstract. System development often involves decisions about how a high-level 
design is to be implemented using primitives from a low-level platform. Certain 
decisions, however, may introduce undesirable behavior into the resulting imple- 
mentation, possibly leading to a violation of a desired property that has already 
been established at the design level. In this paper, we introduce the problem of 
synthesizing a property-preserving platform mapping: synthesize a set of imple- 
mentation decisions ensuring that a desired property is preserved from a high- 
level design into a low-level platform implementation. We formalize this synthe- 
sis problem and propose a technique for generating a mapping based on symbolic 
constraint search. We describe our prototype implementation, and two real-world 
case studies demonstrating the applicability of our technique to the synthesis of 
secure mappings for the popular web authorization protocols OAuth 1.0 and 2.0. 


1 Introduction 


When building a complex software system, one may begin by coming up with an 
abstract design, and then construct an implementation that conforms to this design. 
In practice, there are rarely enough time and resources available to build an implemen- 
tation from scratch, and so this process often involves reuse of an existing platform—a 
collection of generic components, data structures, and libraries that are used to build an 
application in a particular domain. 

The benefits of reuse also come with potential risks. A typical platform exhibits 
its own complex behavior, including subtle interactions with the environment that may 
be difficult to anticipate and reason about. Typically, the developer must work with 
the platform as it exists, and is rarely given the luxury of being able to modify it and 
remove unwanted features. For example, when building a web application, a developer 
must work with a standard browser and take into account all its features and security 
vulnerabilities. As a result, achieving an implementation that perfectly conforms to the 
design—in the traditional notion of behavioral refinement [20]—may be too difficult 
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in practice. Worse, the resulting implementation may not necessarily preserve desirable 
properties that have already been established at the level of design. 

These risks are especially evident in applications where security is a major con- 
cern. For example, OAuth 2.0, a popular authorization protocol subjected to rigorous 
and formal analysis at an abstract level [9,33,42], has been shown to be vulnerable to 
attacks when implemented on a web browser or a mobile device [10,39,41]. Many of 
these vulnerabilities are not due to simple programming errors: They arise from logi- 
cal flaws that involve a subtle interaction between the protocol logic and the details of 
the underlying platform. Unfortunately, OAuth itself does not explicitly guard against 
these flaws, since it is intended to be a generic, abstract protocol that deliberately omits 
details about potential platforms. On the other hand, anticipating and mitigating against 
these risks require an in-depth understanding of the platform and security expertise, 
which many developers do not possess. 

This paper proposes an approach to help developers overcome these risks and 
achieve an implementation that preserves desired properties. In particular, we formu- 
late this task as the problem of automatically synthesizing a property-preserving plat- 
form mapping: A set of implementation decisions ensuring that a desired property is 
preserved from a high-level design into a low-level platform implementation. 

Our approach builds on the prior work of Kang et al. [28], which proposes a mod- 
eling and verification framework for reasoning about security attacks across multiple 
levels of abstraction. The central notion in this framework is that of a mapping, which 
captures a developer’s decisions about how abstract system entities are to be realized in 
terms of their concrete counterparts. In this paper, we fix a bug in the formalization of 
mapping in [28] and extend the framework of [28] with the novel problem of synthe- 
sizing a property-preserving mapping. In addition, we present an algorithmic technique 
for performing this synthesis task. Our technique, inspired by the highly successful 
paradigms of sketching and syntax-guided synthesis [3,26,37,38], takes a constraint 
generalization approach to (1) quickly prune the search space and (2) produce a solu- 
tion that is maximal (i.e., a largest set of mappings that preserve a given property). 

We have built a prototype implementation of the synthesis technique. Our tool 
accepts a high-level design model, a desired system property (both specified by the 
developer), and a model of a low-level platform (built and maintained separately by 
a domain expert). The tool then produces a maximal set of mappings (if one exists) 
that would ensure that the resulting platform implementation preserves the given prop- 
erty. We have successfully applied our tool to synthesize property-preserving map- 
pings for two non-trivial case studies: the authentication protocols OAuth 1.0 and 2.0 
implemented on top of HTTP. Our results are promising: The implementation deci- 
sions captured by our synthesized mappings describe effective mitigations against some 
of the common vulnerabilities that have been found in deployed OAuth implementa- 
tions [39,41]. 

The contributions of this paper include: a formal treatment of mapping, including 
a correction in the original definition [28] (Sect. 2); a formulation of the mapping syn- 
thesis problem, a novel approach for ensuring the preservation of a property between 
a high-level design and its platform implementation (Sect.3); a technique for auto- 
matically synthesizing mappings based on symbolic constraint search (Sect. 4); and a 
prototype implementation of the synthesis technique along with a real-world case study 
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demonstrating the feasibility of this approach (Sect. 5). We conclude with a discussion 
of related work (Sect. 6). 


2 Mapping Composition 


Our approach builds on the modeling and verification framework proposed by Kang 
et al. [28], which is designed to allow modular reasoning about behavior of processes 
across multiple abstraction layers. In this framework, a trace-based semantic model 
(based on CSP [21]) is extended to represent events as sets of labels, and includes a 
new composition operator based on the notion of mappings, which relate event labels 
from one abstraction layer to another. In this section, we present the essential elements 
of this framework. 


{a.b.p} Alice | | Sender {p} {s} 
{a.e.p} labs} {sy} Cs) {px} Lalice = { 2-D-P, a.b.s, a.e.p, a.e.s } 
O Leve = { a.e.p, a.e.s, U.e.p, u.e.s } 
{p-y} {sx} sender = LRecvy = 
pa ea Oe ae a op te i cee Ue ena oe A ens a {P, S, p-X, S.X, p-y, sy } 
iep Eve Recvy {p} 
{s.x} ts} EAlice = { {a-b-p}, {a.b.s}, {a.e.p} } 
{u.e.p} {a.e.s} Egve = { {a.e.p}, {a.e.s}, {u.e.p}, {u.e.s} } 
Ce) Ca) Egender = { {P} {8}, {P-x}, {8.x}, {P-y}, {S-y}} 
{u.e.s} {p.x} ERecvy = ( {P}, {9}, {Px} (8x) } 
(a) Abstract Channel (b) Public Channel (c) Labels and events 


Fig. 1. A pair of high-level (abstract) and low-level (public) communication models. Note that 
each event is a set of labels, where each label describes one possible representation of the event. 


Running Example. Consider a simple example involving communication of messages 
among a set of processes. In our modeling approach, the communication of a message 
is represented by labels of the form sender.receiver.message. For example, label a.e.p 
represents Alicesending Eve a public, non-secret message. Similarly, a.b.s represents 
Alicesending a secret message to another process (b for Bob, for example). In this sys- 
tem, Aliceis unwilling to share its secret with Eve; in Fig. 1(a), this is modeled by the 
absence of any transition on event {a.e.s} in the Aliceprocess. 

Eve is a malicious character whose goal is to learn Alice’s secret. Beside a.e.p and 
a.e.s, Eve is associated with two additional labels, u.e.p and u.e.s, which represent 
receiving a public or secret message, respectively, through some unknown sender u. 
Conceptually, these two latter labels can be regarded as side channels [30] that Eve 
uses to obtain information. 

A desirable property of this abstract communication system is that Eve should never 
be able to learn Alice’s secret!. In this case, it can be easily observed that the property 
holds, since Alice, by design, never sends the secret to Eve. 


' A formalization of this property is provided later in this section. 
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The model in Fig. 1(b) describes communication over a low-level public channel 
that is shared among all processes. A message sent over this channel may be encrypted 
using a key, as captured by labels of the form message.key. For instance, p.x and s.x 
represent the transmission of a public and secret message, respectively, using key x. 
A message may also be sent in plaintext by omitting an encryption key (e.g., label s 
represents the plaintext transmission of a secret). Each receiver on the public channel is 
assumed to have knowledge of only a single key; for instance, Recvy only knows key x 
and thus cannot receive messages that are encrypted using key y (i.e., labels p.y and s.y 
do not appear in events of Recvy). 

Suppose that we wish to reason about the behavior of the abstract communication 
system from Fig. 1(a) when it is implemented over the public channel in Fig. 1(b). In 
particular, in the low-level implementation, Eve and other processes (e.g., Bob) are 
required to share the same channel, no longer benefitting from the separation provided 
by the abstraction in Fig. 1(a). Does the property of the abstract communication hold 
in every possible implementation? If not, which decisions ensure that Alice’s secret 
remains protected from Eve? We formulate these questions as the problem of synthesiz- 
ing a property-preserving mapping between a pair of high-level and low-level models. 


Events, Traces, and Processes. Let L be a potentially infinite set of labels. An event e 
is a finite, non-empty set of labels: e € E(L), where E(L) is the set of all finite subsets 
of L except the empty set Ý. Let S* be the set of all finite sequences of elements of set S. 
A trace t is a finite sequence of events: t € T(L), where T(L) is the set of all traces over 
L (ie., T(L) = (E(L))*). The empty trace is denoted by (), and the trace consisting of 
a sequence of events e1, e2, ... is denoted (e1, e2, ...). If t and r’ are traces, then t- f is 
the trace obtained by concatenating t and r’. Note that () - t = t - () = t for any trace t. 

Let t be a trace over set of labels L, and let A C L be a subset of L. The projection 
of t onto A, denoted t | A, is defined as follows: 


OA=0 Uo- ey fee ia 


For example, if £ = ({a}, {a,c}, {b}), then t | {a,b} = ({a}, {a}, {b}) and z | 
{b,c} = ({c}, {B}). 

A process P is defined as a triple (Lp, Ep, Tp). The labels of process P, Lp C L, is 
the set of all labels appearing in P, and Ep C E(L) is the set of events that may appear 
in traces of P, which are denoted by Tp C T(L). We assume traces in every process P 
to be prefix-closed; i.e., () € Tp and for every non-empty trace t = t- (e) € Tp, t € Tp. 


Parallel Composition. A pair of processes P and Q synchronize with each other by 
performing events e; and e2, respectively, if these two events share at least one label. In 
their parallel composition, denoted P || Q, this synchronization is represented by a new 
event e’ that is constructed as the union of e; and ex (i.e., e’ = e1 U e2). 

Formally, let P = (Lp, Ep, Tp) and Q = (Lo, Eg, Tg) be a pair of processes. Their 
parallel composition is defined as follows: 


Epo = {e € E(Lp U Lg) | eventCond(e, P) ^ eventCond(e, Q) ^ syncCond(e)} 
Trio = {t E€ (Erlo) | (t | Lp) € Tp A (t | Lo) E€ To} (Def. 1) 
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where Ley = Lp U Lo, predicate eventCond is defined as 
eventCond(e,P) =eN Lp =O0V eN Lp € Ep 


and a condition on synchronization, syncCond, is defined as 


syncCond(e) =e C Lp — Lo Ve C Lo — Lp V (Ga € e:a€ Lp Lo) (Cond. 1) 


The definition of Tpjg states that if we take a trace ¢ in the composite process and 
ignore labels that appear only in Q, then the resulting trace must be a valid trace of P 
(and symmetrically for Q). The condition (Cond. 1) is imposed on every event appear- 
ing in Tp}/9 to ensure that an event performed together by P and Q contains at least one 
common label shared by both processes. 

This type of parallel composition can be seen as a generalization of the parallel 
composition of CSP [21], from single labels to sets of labels. That is, the CSP parallel 
composition is the special case of the composition of Def. 1 where every event is a 
singleton (i.e., it contains exactly one label). Note that if event e contains exactly one 
label a, then a must belong to the alphabet of P or that of Q, which means syncCond(e) 
always evaluates to true. The resulting expression in that case 


Tro = {t E€ T(Lp U Lo) | (t | Lp) € Tp A (t | Lo) € To} 


is equivalent to the definition of parallel composition in CSP [21, Sec. 2.3.3]. 


Mapping Composition. A mapping m over set of labels L is a partial function m : L —> 
L. Informally, m(a) = b stipulates that every event that contains a as a label is to be 
assigned b as an additional label. We sometimes use the notations a +>, b or (a,b) € m 
as alternatives to m(a) = b. When we write m(a) = b we mean that m(a) is defined 
and is equal to b. The empty mapping, denoted m = 9, is the partial function m : L — L 
which is undefined for all a € L. 

Mapping composition allows a pair of processes to interact with each other over dis- 
tinct labels. Formally, consider two processes P = (Lp, Ep, Tp) and Q = (Lo, Eo, To), 
and let L = Lp U Lg. Given mapping m : L — L, the mapping composition P||mQ is 
defined as follows: 


Ep\,,0 = {e € E(Lp U Lg) | eventCond(e, P) ^ eventCond(e, Q) ^ 
syncCond' (e) ^ mapCond(e, m) } 
Trino = {t € (Erno) | (t | Le) € Te A (t | Lo) € To} (Def. 2) 


where Lpj„o = Lp U Lg, and syncCond'(e) and mapCond(e, m) are defined as: 


syncCond'(e) = syncCond(e) V (da € eN Lp, dab € eN Lg: m(a) = b V m(b) =a) 
mapCond(e,m) = (Va € e : a E€ dom(m) = m(a) € e) 


where dom(m) is the domain of function m. Compared to Def. 1, the additional disjunct 
in syncCond' (e) allows P and Q to synchronize even when they do not share any label, 
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if at least one pair of their labels are mapped to each other in m. The predicate mapCond 
ensures that if an event e contains a label a and m is defined over a, then e also contains 
the label that a is mapped to. 

Note that Def. 2 is different from the definition of mapping composition in [28], 
and corrects a flaw in the latter. In particular, the definition in [28] omits condition 
syncCond', which permits the undesirable case in which events e; and ez from P and Q 
are synchronized into union e = e; U eg even when the events do not share any label. 


Example. Let P and Q be the abstract and public channel communication models from 
Fig. l(a) and (b), respectively. The property that Eve never learns Alice’s secret can be 
stated as follows: 


P= a(de E E(L) : l, l2 €e: ly = ą.*.s ^ Ip = * es) 


where * € {a,b,e,u}. In other words, Eve should never be able to engage in an event 
that involves the transmission of Alice’s secret. From Fig. l(a), it can be observed that 
P = Alice||Eve = ©. 

Suppose that we decide on a simple implementation scheme where the abstract 
messages sent by Aliceare transmitted over the public channel in plaintext; this decision 
can be encoded as a mapping, mı, where each abstract label (i.e., LAlice in Fig. 1(c)) is 
mapped to concrete label p or s as follows: 


a.b.p, a.e.p, U.e.p Hm, P a.b.s, a.e.S, U.e.S >m; S 


The resulting implementation can be constructed as process Zm, = (Alice ||m, Sender) || 
(Evel|in, Recvx). Due to the definition of mapping composition (Def. 2), the following 
event may appear in a trace of the overall composite process: 


({a.b.s,s,a.e.s}) € T; 


my 


Note that this trace is a violation of the above property (i.e., Zn, (A P). This can be seen 
as an example of abstraction violation: As a result of decisions in mı, a.b.s and u.e.s 
now share the same underlying representation (s), and Eve is able to engage in an event 
with a label (a.b.s) that was not previously available to it in the abstract model. 
Properties of the Mapping Composition Operator. Mapping composition is a gener- 
alization of parallel composition: The latter is a special case of mapping composition 
where the given mapping is empty: 


Lemma 1. Given a pair of processes P and Q, if m = ) then P||nQ =P || Q. 


Commutativity. The proposed mapping composition operator is commutative: i.e., 
P\|mQ = Q||mP. This property can be inferred from the fact that Def. 2 is symmetric 
with respect to P and Q. It follows that by being a special case of mapping composition, 
the parallel composition operator is also commutative. 


Associativity. The mapping composition operator is associative under the following 
conditions on the alphabets of involved processes and mappings: 


Theorem 1. Given processes P, Q, and R, let X = (Pl|m,Q)||m,.R and Y = 
P| m3 (Q||ngR). If Ex = Ey, then X = Y. 


Proof. Available in the extended version of this paper [27]. 
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3 Synthesis Problems 


The mapping verification problem is to check, given processes P and Q, mapping m, and 
specification ®, whether (P||,,Q) H &. This problem was studied by Kang et al. [28]. 
In this paper, we introduce and study, for the first time to our knowledge, the problem 
of mapping synthesis. We begin with a simple formulation of the problem and then 
generalize it. We will not define what exactly the specification ® may be, neither the 
satisfaction relation =, as the mapping synthesis problems defined below are generic 
and can work with any type of specification or satisfaction relation. In Sect.5.1, we 
discuss how this generic framework is instantiated in our implementation. 


Problem 1 (Mapping Synthesis). Given processes P and Q, and specification ®, find, 
if it exists, a mapping m such that (P||mQ) H &. We call such an m a valid mapping. 


Note that if ® is a trace property [2,29], this problem can be stated as a JV problem; 
that is, finding a witness m to the formula dm: Vt € Tpy,0 :t € p. 

Instead of synthesizing m from scratch, the developer may wish to express their 
partial system knowledge as a given constraint, and ask the synthesis tool to generate 
a mapping that adheres to this constraint. For instance, given labels a,b,c € L, one 
may express a constraint that a must be mapped to either b or c as part of every valid 
mapping; this gives rise to two possible candidate mappings, mı and m2, where mı (a) = 
b and m2(a) = c. Formally, let M be the set of all possible mappings between labels L. 
A mapping constraint C C M is a set of mappings that are considered legal candidates 
for a final, synthesized valid mapping. Then, the problem of synthesizing a mapping 
given a constraint can be formulated as follows: 


Problem 2 (Generalized Mapping Synthesis). Given processes P and Q, specification 
®, and mapping constraint C, find, if it exists, a valid mapping m such that m € C. 


Note that Problem 1 is a special case of Problem 2 where C = M. The synthesis problem 
can be further generalized to one that involves synthesizing a constraint that contains a 
set of valid mappings: 


Problem 3 (Mapping Constraint Synthesis). Given processes P and Q, specification ®, 
and mapping constraint C, generate, if it exists, a non-empty set of valid mappings C’ 
such that C’ C C. We call such a C’ valid with respect to P, Q, ® and C. 


A procedure for solving Problem 3 can be used to solve Problem 2: Having generated 
constraint C’, we can pick any mapping m € C’. Such an m is guaranteed to be valid 
and also to belong in C. 

In practice, it is desirable for C’ to be as large as possible while still being valid, 
as it provides more implementation choices (i.e., possible mappings). In particular, we 
say that a mapping constraint C’ is maximal with respect to P, Q, ®, and C if and only 
if (1) C’ is valid with respect to P, Q, ®, and C, and (2) there exists no other constraint 
C” such that C” is also valid w.r.t. P, Q, ®, C, and C’ C C". Then, our final synthesis 
problem can be stated as follows: 


Problem 4 (Maximal Constraint Synthesis). Given processes P and Q, property ®, and 
constraint C, generate, if it exists, a maximal constraint C’ with respect to P, Q, ®, C. 
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If found, C’ is a local optimal solution. In general, there may be multiple maximal 
constraints for given P, Q, ®, and C. 


Example. Back to our running example, an alternative implementation of the abstract 
communication model over the public channel involves encrypting messages sent by 
Aliceto Bob using a key (y) that Eve does not possess; this decision can be encoded as 
the following valid mapping mg: 


a.b.p >m PY ADStem, SY a.e.p >m P.X a.e.Sten, SY 


Since Eve cannot read messages encrypted using key y, she is unable to obtain Alice’s 
secret over the public channel; thus, Zm, H ®, where Im, = (Alice||m, Sender) || 
(Eve||m, Recvy). 

The following mapping, m3, which leaves non-secret messages unencrypted in the 
low-level channel (as p), is also valid with respect to ®: 


a.b.p >m Pp abStem, SY ACP >m P a.e.S >m S-Y 


since Eve being able to read non-secret messages does not violate the property. Thus, 
the developer may choose either m2 or mg to implement the abstract channel and ensure 
that Alice’s secret remains protected from Eve. In other words, C1 = {m2, mg} is a valid 
(but not necessarily maximal) mapping constraint with respect to the desired property. 
Furthermore, C, is arguably more desirable than another constraint C2 = {mz}, since 
the former gives the developer more implementation choices than the latter does. 


4 Synthesis Technique 


Mapping Representation. In our approach, mappings are represented symbolically as 
logical expressions over variables that correspond to labels being mapped. The sym- 
bolic representation has the following advantages over an explicit one (where the entries 
of mapping m are enumerated explicitly): (1) it provides a succinct representation of 
implementation decisions to the developer (which is especially important as the size of 
the mapping grows large) and (2) it allows the user to specify partial implementation 
decisions (i.e., given constraint C) in a declarative manner. 

We adopt the symbolic representation and, inspired by SyGuS [3], use a syntactic 
approach where the space of candidate mapping constraints is restricted to expressions 
that can be constructed from a given grammar. Our grammar is specified as follows: 


Term := Var | Const Assign := (Term = Term) 


Expr := Assign | >Assign | Assign = Assign | Expr ^ Expr 


where Var is a set of variables that represent parameters inside a label, and Const is 
the set of constant values. Intuitively, this grammar captures implementation decisions 
that involve assignments of parameters in an abstract label to their counterparts in a 
concrete label (represented by the equality operator “=”’). A logical implication is used 
to construct a conditional assignment of a parameter. 
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A mapping constraint is symbolically represented as a set of predicates, each of 
the form ¥ (abs, conc) over symbolic labels abs and conc, where abs represents the 
label being mapped to conc. The body of each predicate is constructed as an expression 
from the above grammar. For example, let abs = a.b.msg be a symbolic encoding of 
labels that represent Alicecommunicating to Eve, with variable msg corresponding to 
the message being sent; similarly, let conc = msg’ .key be a symbolic label in the public 
channel model, where msg’ and key correspond to the message being transmitted and 
the key used to encrypt it (if any). Then, the expression 


X(a.b.msg, msg’ .key) = msg = msg’ ^ (msg = s > key = y) 


states that (1) parameter msg in the abstract label must be equal to that in the concrete 

label (i.e., the message being transmitted must be preserved during the mapping) and 

(2) if the message is a secret, key y must be used to encrypt it in the implementation. 
The set of mappings that predicate ¥ (abs, conc) represents is defined as: 


C={m:L—L|Vabs € L:(abs € dom(m) = Aconc € L : X (abs, conc)) ^ 
(abs € dom(m) = X (abs, m(abs)))} 


That is, a mapping m is allowed by ¥ (abs, conc) if and only if for each label abs, (1) 
m is defined over abs if and only if there exists some label conc for which 4 (abs, conc) 
evaluates to true, and (2) m maps abs to such a label conc. 


Algorithmic Considerations. To ensure that the algorithm terminates, the set of 
expressions that may be constructed using the given grammar is restricted to a finite 
set, by bounding the domains of data types (e.g., distinct messages and keys in our run- 
ning example) and the size of expressions. We also assume the existence of a verifier 
that is capable of checking whether a candidate mapping satisfies a given specification 
®. The verifier implements function verify(C, P, Q, 8) which returns OK if and only if 
every mapping allowed by constraint C is valid with respect to P, Q, ®. 


Generalization Algorithm. Once we limit the number of candidate expressions to be 
finite, we can use a brute-force algorithm to enumerate and check those candidates one 
by one. However, this naive algorithm is likely to suffer from scalability issues. Thus, 
we present an algorithm that takes a generalization-based approach to identify and prune 
undesirable parts of the search space. A key insight is that only a few implementation 
decisions—captured by some minimal subset of the entries in a mapping—may be suf- 
ficient to imply that the resulting implementation will be invalid. Thus, given some 
invalid mapping, the algorithm attempts to identify this minimal subset and construct a 
larger constraint Cpaa that is guaranteed to contain only invalid mappings. 

The outline of the algorithm is shown in Fig. 2. The function synthesize takes four 
inputs: processes P and Q, specification ®, and a user-specified mapping constraint C. 
It also maintains a set of constraints X, which keeps track of “bad” regions of the search 
space that do not contain any valid mappings. 

In each iteration, the algorithm selects some mapping m from C (line 3) and checks 
whether it belongs to one of the constraints in X (meaning, the mapping is guaranteed 
to result in an invalid implementation). If so, it is simply discarded (lines 4—5). 
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Otherwise, the verifier is used to check whether m is valid with respect to ® (line 
7). If so, then generalize is invoked to produce a maximal mapping constraint Cyaximal, 
which represents the largest set that contains {m}, is contained in C, and is valid with 
respect to P, Q, ® (line 9). If, on the other hand, m is invalid (i.e., it fails to preserve 
D), then generalize is invoked to compute the largest superset Cpaa of {m} that contains 
only invalid mappings (i.e., those that satisfy =). The set Chaa is then added to X and 
used to prune out subsequent, invalid candidates (line 13). 


1 fun synthesize(P. Q, ®, C) 18 fun generalize(C’, P. Q, ®, C) 

2 X={} 19 K + decompose(C') 

3 for m € C do 20 for k € K do 

4 if 3 Chaa € X : M € Chaa then 21 Cretarea + relax(C’ , k) 

5 skip 2 result 4— verify(Cretaxea, P, Q, D) 
6 end 23 if result = OK ^ Crelaxea C C then 
7 result <— verify({m}, P, Q, ®) 24 | C © Cretaxea 

8 if result = OK then 25 end 

9 Cinaximat < generalize({m}, P, Q, 8, C) 26 end 

10 return Cyaximal 27 return C’ 

1 else 28 end 

12 Cbaa < generalize({m}, P, Q, =®, C) 

13 X¢XU {Chaa} 

14 end 

15 end 

16 return none 

17 end 


Fig. 2. An algorithm for synthesizing a maximal mapping constraint. 


Constraint Generalization. The function generalize(C', P, Q, 8, C) computes a maxi- 
mal set that contains C’, is contained within C, and only permits mappings that satisfy 
®. This function is used in two different ways: (1) to identify an undesirable region of 
the candidate space that should be avoided, and (2) to produce a maximal version of a 
valid mapping constraint. 

The procedure works by incrementally growing C’ into a larger set Cyeiaxeq and stop- 
ping when C,elaxed contains at least one mapping that violates &. Suppose that constraint 
C’ is represented by a symbolic expression 1, which itself is a conjunction of n subex- 
pressions ky A kg A^ ... A kn, where each k; for 1 < i < n represents a (possibly con- 
ditional) assignment of a variable or a constant to some label parameter. The function 
decompose(C’) takes the given constraint and returns the set of such subexpressions. 
The function relax(C’, k;) then computes a new constraint by removing k from C’; this 
new constraint, C,elaxed, is a larger set of mappings that subsumes C’. 

The verifier is then used to check Cyejaxea against ® (line 22). If Cyeiaxea is still valid 
with respect to @, then the implementation decision encoded by k is irrelevant for ®, 
meaning we can safely remove k from the final synthesized constraint C’ (line 24). If 
not, k is retained as part of C’, and the algorithm moves onto the next subexpression k 
as a candidate for removal (line 20). On line 23, we also make sure that Cyejaxeq does not 
violate the predefined user constraints C. 
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Example. Let abs = a.e.msg be a symbolic label that represents Alice sending a mes- 
sage (msg) to Eve, and conc = msg’ key be its corresponding label in the public channel 
model. Then, one candidate constraint C’ for mappings from the high-level to low-level 
labels can be specified as the following expression: 


X(a.e.msg, msg’ key) = msg = msg’ ^ (msg = s > key = y) A (msg = p > key = x) 


Suppose that this constraint C’ has been verified to be valid with respect to P, Q and 
®. Next, the generalization procedure removes the subexpression kı = (msg = p > 
key = x) from C’, resulting in constraint C,ejaxea that is represented as: 


X(a.e.msg, msg’ key) = msg = msg’ ^ (msg = s > key = y) 


When checked by the verifier (line 22), C’ is still considered valid, meaning that the 
decision encoded by ky is irrelevant to the property; thus, kı can be safely removed. 

However, removing kp = (msg = s => key = y) results in a violation of the property. 
Thus, kə is kept as part of the final maximal constraint expression. 


5 Implementation and Case Studies 


5.1 Implementation 


We have built a prototype implementation of the synthesis algorithm described in 
Sect. 4. Our tool uses the Alloy Analyzer [25] as the underlying modeling and veri- 
fication engine. Alloy’s flexible, declarative relational logic is convenient for encoding 
the semantics of the mapping composition as well as specifying mapping constraints. 
The analysis engine for Alloy uses an off-the-shelf SAT solver to perform bounded 
verification [25]. In particular, our current prototype is capable of synthesizing map- 
pings to preserve the following types of properties: reachability and safety properties, 
which can be expressed in either of the forms dt: t € Tp A t € ¢ (reachability) and 
a5dt:t€ Tp At ¢ ¢ (safety) for some process P and property ¢. 


Client AuthServer Client © AuthServer | 1. initiate(ret_session, 
(©) 1. initiate(ret_session) a S ret_reqToken) 
: 2. authorize(userid, pwd, © = 2. getReqToken(ret_reqToken) 
2 © . ret_code) : © : 3. authorize(userid, pwd, 
Nx A 3. forward(code, session) ed Z reqToken) 
(©) © 4. getToken(code, ret_token) Ow f ? © 4. notify(reqToken) 
5. getAccessToken(reqToken, 
User (Alice or Eve) User ret_accessToken) 
(a) OAuth 2.0 (b) OAuth 1.0 


Fig. 3. A high-level overview of the two OAuth protocols, with a sequence of event labels that 
describe protocol steps in the typical order that they occur. Each arrowed edge indicates the direc- 
tion of the communication. Variables inside labels with the prefix ret_ represent return parame- 
ters. For example, in Step 2 of OAuth 2.0, User passes their user ID and password as arguments 
to AuthServer, which returns ret_code back to User in response. 


? The tool, along with the models used in our case studies, is available at https://github.com/ 
eskang/MappingSynthesisTool. 
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However, our synthesis approach does not prescribe the use of a particular modeling 
and verification engine, and can be implemented using other tools as well (such as an 
SMT solver [11, 12]). 


5.2 Case Studies: OAuth Protocols 


As two major case studies, we took on the problem of synthesizing valid mappings 
for OAuth 1.0 and OAuth 2.0, two real-world protocols used for third-party authoriza- 
tion [24]. The purpose of the OAuth protocol family in general is to allow an application 
(called a client in the OAuth terminology) to access a resource from another applica- 
tion (an authorization server) without needing the credentials of the resource owner 
(a user). For example, a gaming application may initiate an OAuth process to obtain 
a list of friends from a particular user’s Facebook account, provided that the user has 
authorized Facebook to release this resource to the client. 

OAuth 2.0 is the newer version of the protocol, while OAuth 1.0 is an older version. 
Although OAuth 2.0 is intended to be a replacement for OAuth 1.0, there has been much 
contention within the developer community about whether it actually improves over its 
predecessor in terms of security [17]. Since both protocols are designed to provide the 
same security guarantees (i.e., both share common properties), our goal was to apply 
our synthesis approach to systematically compare what developers would be required 
to do in order to construct secure web-based implementations of the two. 


5.3 Formal Modeling 


For our case studies, we constructed the following set of Alloy models: (1) model P1.0 
representing OAuth 1.0; (2) model P2.0 representing OAuth 2.0; (3) model Q represent- 
ing generic HTTP interactions between a browser and a server, as well as the behavior of 
a web-based attacker; (4) specification ® describing desired protocol properties (same 
for both OAuth 1.0 and 2.0); and (5) mapping constraints C; 9 and C29 representing ini- 
tial, user-specified partial mappings for OAuth 1.0 and 2.0, respectively. The complete 
models are approximately 1800 lines of Alloy code in total, and took around 4 man- 
months to build. These models were then provided as inputs to our tool to solve two 
instances of Problem 4 from Sect. 3. In particular, we synthesized a maximal mapping 
constraint C} o such that every m € C} o ensures that P1.0||nQ H ®. and a maximal 
mapping constraint C4 9 such that every m € Ch o ensures that P2.0||mQ H &. 


OAuth Models (P1.0, P2.0). We constructed Alloy models of OAuth 1.0 and 2.0 based 
on the official protocol specifications [23,24]. Due to limited space, we give only a brief 
overview of the models. Each model consists of four processes: Client, AuthServer, and 
two users, Aliceand Eve (the latter with a malicious intent to access Alice’s resources). 
A typical OAuth 2.0 workflow, shown in Fig. 3(a), begins with a user (Aliceor Eve) 
initiating a new protocol session with Client (initiate). The user is then asked to prove 
their own identity to AuthServer (by providing a user ID and a password) and officially 
authorize the client to access their resources (authorize). Given the user’s authorization, 
the server then allocates a unique code for the user, and then redirects their back to the 
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client. The user forwards the code to the client (forward), which then can exchange the 
code for an access token to their resources (getToken). 

Like in OAuth 2.0, a typical workflow in OAuth 1.0 (depicted in Fig. 3(b)) begins 
with a user initiating a new session with Client (initiate). Instead of immediately direct- 
ing the user to AuthServer, however, Client first obtains a request token from Auth- 
Server and associates it with the current session (getReqToken). The user is then asked 
to present the same request token to AuthServer and authorize Client to access their 
resources (authorize). Once notified by the user that the authorization step has taken 
place (notify), Client exchanges the request token for an access token that can be used 
subsequently to access their resources (getAccessToken). 


Specification (@). There are two desirable properties of OAuth protocols in general: 
(1) Authenticity: When the client receives an access token, it must correspond to the 
user who initiated the current protocol session. (2) Completion: There exists at least 
one trace in which the protocol interactions are carried out to completion in the order of 
steps described in Fig. 3. Authenticity is a safety property while completion is a reacha- 
bility property. The input specification ® consists of these two properties. Completion is 
essential for ruling out mappings that over-constrain the resulting implementation and 
prevent certain steps of the protocol from being performed. 


HTTP Platform Model (Q). Our goal was to explore and synthesize web-based imple- 
mentations of OAuth. For this purpose, we constructed a formal model depicting inter- 
actions between a generic HTTP server and web browser. The model contains two types 
of processes, Server and Browser (which may be instantiated into multiple processes 
representing different servers and browsers). They interact with each other over HTTP 
requests, which share the following signature: 


req(method : Method, url : URL, headers : List{Header], body : Body, ret_resp : Resp) 


The parameters of an HTTP request have their own internal structures, each consisting 
of its own parameters as follows: 


url(host : Host, path : Path, queries : List{Query]) header(name : Name, val : Value) 
resp(status : Status, headers : List[Header], body : Body) 


initiate(ret_session) -> forward(code, session) +> 

req(GET, http://client.com/initiate?queries, headers, req(POST, http://client.com/forward?queries, headers, 
body, ret_resp(OK, [set-cookie: ret_session], body)) body, ret_resp(OK, [], body)) 

authorize(userid, pwd, ret_code) +> getToken(code, ret_token) +> 

req(POST, http://server.com/authorize?queries, headers, | req(GET, http://client.com/getToken?[code], headers, 
body, ret_resp(Redirect, headers, body)) body, ret_resp(OK, [ ], ret_token)) 


Fig. 4. User-specified partial mappings from OAuth 2.0 to HTTP. Terms highlighted in blue and 
red are variables that represent the parameters inside OAuth and HTTP labels, respectively. For 
example, in forward, the abstract parameters code and session may be transmitted as part of an 
URL query, a header, or the request body, although its URL is fixed to http://client.com/forward. 
(Color figure online) 
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Our model describes generic, application-independent HTTP interactions. In partic- 
ular, each Browser process is a machine that constructs, at each communication step 
with Server, an arbitrary HTTP request by non-deterministically selecting a value for 
each parameter of the request. The processes, however, follow a platform-specific logic; 
for instance, when given a response from Server that instructs a browser cookie to be 
stored at a particular URL, Browser will include this cookie along with every subsequent 
request directed at that URL. In addition, the model includes a process that depicts the 
behavior of a web attacker, who may operate their own malicious server and exploit 
weaknesses in a browser to manipulate the user into sending certain HTTP requests. 


Mapping Constraint (C1.0, C2.9). Building a web-based implementation of OAuth 
involves decisions about how abstract protocol operations are to be realized in terms of 
HTTP requests. As an input to the synthesizer, we specified an initial set of constraints 
that describe partial implementation decisions for both OAuth protocols; the ones for 
OAuth 2.0 are shown in Fig.4. These decisions include a designation of fixed host 
and path names inside URLs for various OAuth operations (e.g., http:/client.com/initiate 
for the OAuth initiate event), and how certain parameters are transmitted as part of an 
HTTP request (ret_session as a return cookie in initiate). It is reasonable to treat these 
constraints as given, since they describe decisions that are common across typical web- 
based OAuth implementations. 


Insecure Mapping for OAuth 2.0. Let us now give an example of an insecure mapping 
that satisfies the user-given constraint in Fig. 4 but could introduce a security vulnera- 
bility into the resulting implementation. Later in Sect. 5.4, we describe how our tool can 
be used to synthesize a secure mapping that prevents this vulnerability. 

Consider the OAuth 2.0 workflow from Fig. 3(a). In order to implement the forward 
operation, for instance, the developer must determine how the parameters code and 
session of the abstract event label are encoded using their concrete counterparts in an 
HTTP request. A number of choices is available. In one possible implementation, the 
authorization code may be transmitted as a query parameter inside the URL, and the 
session as a browser cookie, as described by the following constraint expression, æ: 


X\ (a,b) = (b.method = POST) A (b.url.host = client.com) ^ 
(b.url.path = forward) A (b.url.queries|0| = a.code) ^ 
(b.headers|0|.name = cookie) ^ (b.headers|0|.value = a.session) 


where POST, client.com, forward, and cookie are predefined constants; and /[i] refers to 
i-th element of list /. 

This constraint, however, allows a vulnerable implementation where malicious user 
Eve performs the first two steps of the workflow in Fig. 3(a) using her own credentials, 
and obtains a unique code (codeg¢ye) from the authorization server. Instead of forward- 
ing this to Client (as she is expected to), Eve keeps the code herself, and crafts their own 
web page that triggers the visiting browser to send the following HTTP request: 


req(POST, http://client.com/forward?codegya, -..) 


Suppose that Alice is a naive browser user who may occasionally be enticed or tricked 
into visiting malicious web sites. When Alice visits the page set up by Eve, Alice’s 
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browser automatically generates the above HTTP request, which, given the decisions in 
1, corresponds to a valid forward event: 


forward(codegye, SeSSiION Alice ) +> 
req(POST, http://client.com/forward?codegye, [(cookie, sessionatice )|, ---) 


Due to the standard browser logic, the cookie corresponding to sessionalice is included 
in every request to client.com. As a result, Client mistakenly accepts codegve as the one 
for Alice, even though it belongs to Eve, violating the authenticity property of OAuth 
(this attack is also called session swapping [39]). 


5.4 Results 


Our synthesis tool was able to generate valid mapping constraints for both OAuth pro- 
tocols. In particular, the constraints describe mitigations against attacks that exploit an 
interaction between the OAuth logic and security vulnerabilities in a web browser. 


OAuth 2.0. The synthesized symbolic mapping constraint for OAuth 2.0 consists of 39 
conjuncts in total, each capturing a (conditional) assignment of a concrete HTTP param- 
eter to a constant (e.g., b.url.path = forward) or an abstract OAuth parameter (e.g., 
b.url.queries|0| = a.code). In particular, the constraint captures mitigations against ses- 
sion swapping [39] and covert redirect [16]. Due to limited space, we omit the full 
constraint, but instead describe how the vulnerability described at the end of Sect. 5.3 
can be mitigated by our synthesized mapping. 

Consider the insecure mapping expression X; from Sect. 5.3. The mapping con- 
straint synthesized by our tool, 2, fixes the major problem of ¥ı; namely, that in a 
browser-based implementation, the client cannot trust an authorization code as hav- 
ing originated from a particular user (e.g., Alice), since the code may be intercepted or 
interjected by an attacker (Eve) while in transit through a browser. A possible solution 
is to explicitly identify the origin of the code by requiring an additional piece of track- 
ing information to be provided in each forward request. The mapping expression 2 
synthesized by our tool encodes one form of this solution: 


X9(a,b) = Xı (a,b) A (a.session = sessionalice = b.url.queries[1] = nonce) ^ 


(a.session = sessiongye = b.url.queries|1] = nonce,) 


where nonce,, nonce; € Nonce are constants defined in the HTTP model’. In partic- 
ular, X2 stipulates that every forward request must include an additional value (nonce) 
as an argument besides the code and the session, and that this nonce be unique for 
each session value. Xz ensures that the resulting implementation satisfies the desired 
properties of OAuth 2. 


OAuth 1.0. The synthesized symbolic mapping constraint for OAuth 1.0 consists of 48 
conjuncts in total, capturing how the abstract parameters of the five OAuth 1.0 opera- 
tions are related to concrete HTTP parameters. The constraint synthesized by our tool 


A nonce is a unique piece of string intended to be used once in communication. 
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# total # eal i Skono Verification Generali- Total 

candidates explored veriS ee ia Avg. Total zation time 
OAuth 1.0 79200 2465 281 2184 2.01 566.05 490.84 1056.89 
OAuth 2.0 29400 1453 161 1292 1.88 302.76 1138.85 1441.60 


Fig. 5. Experimental results (all times in seconds). “# total candidates” is the total number of 
possible symbolic mapping expressions; “# explored” is the number of iterations taken by the 
main synthesis loop (lines 3-15, Fig. 2) before a solution was found. Out of these iterations, “# 
verified” mappings were verified (line 7), while the rest were identified as invalid and skipped 
(line 5). “Total time” the sum of the Total Verification and Generalization columns) refers to the 
time spent by the tool to synthesize a maximal constraint. 


for OAuth 1.0 encodes a mitigation against the session fixation [15] attack; in short, 
this mitigation involves strengthening the notify operation with unique nonces (similar 
to the way the forward operation in OAuth 2.0 was fixed above) to prevent the attacker 
from violating the authenticity property. 


Performance. Figure 5 shows experimental results for the two OAuth protocols*. Over- 
all, the synthesizer took approximately 17.6 and 24.0 min to synthesize the constraints 
for 1.0 and 2.0, respectively. In both cases, the tool spent a considerable amount of time 
on the generalization step to learn the invalid regions of the search space. Note that 
generalization is effective at identifying and discarding a very large number of invalid 
candidates; it was able to skip 2184 out of 2465 candidates for OAuth 1.0 (+88.6%) and 
1292 out of 1453 for OAuth 2.0 (88.9%). Our generalization technique was particu- 
larly effective for the OAuth protocols, since a significant percentage of the candidate 
constraints would result in an implementation that violates the completion property (i.e., 
it prevents Aliceor Eve from completing a protocol session in an expected order). Often, 
the decisions contributing to this violation could be localized to a small subset of entries 
in a mapping (for example, attempting to send a cookie to a mismatched URL, which is 
inconsistent with the behavior of the browser process). By identifying this subset, our 
algorithm was able to discover and eliminate a large number of invalid mappings. 


6 Related Work 


Our approach has been inspired by the success of recent synthesis paradigms such as 
sketching [36-38], oracle-guided synthesis [26] and syntax-guided synthesis [3]. Our 
technique shares many similarities with these approaches in that (1) it allows the user to 
provide a partial specification of the artifact to be synthesized (in the form of constraints 
or examples), therefore having the underlying engine complete the remaining parts; (2) 
it relies on an interaction between the verifier, which checks candidate solutions, and 
the synthesizer, which prunes that search space based on previous invalid candidates. 
Our work also differs in a number of aspects. First, we synthesize mappings from high- 
level models to low-level execution platforms, which to our knowledge has not been 


4 The experiments were performed on a Mac OS X 2.7 GHz laptop with 8G RAM and Min- 
iSat [13] as the underlying SAT solver employed by the Alloy Analyzer. 
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considered before. Second, our approach leverages constraint generalization to not only 
prune the search space, but also to produce a constraint capturing a (locally) maximal 
set of valid mappings. Third, our application domain is in security protocols. 

A large body of literature exists on refinement-based methods to system construc- 
tion [4,20]. These approaches involve building an implementation Q that is a behavioral 
refinement of P; such Q, by construction, would satisfy the properties of P. In compari- 
son, we start with an assumption that Q is a given platform, and that the developer may 
not have the luxury of being able to modify or build Q from scratch. Thus, instead of 
behavioral refinement (which may be too challenging to achieve), we aim to preserve 
some critical property @ when P is implemented using Q. 

The task of synthesizing a valid mapping can be seen as a type of the model merging 
problem [8]. This problem has been studied in various contexts, including architectural 
views [31], behavioral models [6,32,40], and database schemas [34]. Among these, our 
work is most closely related to merging of partial behavioral models [6,40]. In these 
works, given a pair of models Mı and Mg, the goal is to construct M’ that is a behavioral 
refinement of both Mı and M2. The approach proposed in this paper differs in that (1) 
the mapping composition involves merging a pair of events with distinct alphabet labels 
into a single event that retains all of those labels, and (2) the composed process (P\|nQ) 
need not be a behavioral refinement of P or Q, as long as it satisfies property œ. 

Bhargavan and his colleagues presents a compiler that takes a high-level program 
written using session types [22] and automatically generates a low-level implemen- 
tation [7]. This technique is closer to compilation than to synthesis in that it uses a 
fixed translation scheme from high-level to low-level operations in a specific language 
environment (.NET), without searching a space of possible translations. Synthesizing a 
low-level implementation from a high-level specification has also been studied in the 
context of data structures [18,19], although their underlying representation (relational 
algebra for data schema specification) is very different from ours (process algebra). 

A significant contribution of our work is the production of formal models for real- 
world protocols such as OAuth and HTTP. There have been similar efforts by other 
researchers in building reusable models of the web for security analysis [1,5,14]. As 
far as we know, however, none of these models has been used for synthesis. 


7 Conclusions 


In this paper, we have proposed a novel system design methodology centered around 
the notion of mappings. We have presented novel mapping synthesis problems and an 
algorithm for efficiently synthesizing symbolic maximal valid mappings. In addition, 
we have validated our approach on realistic case studies involving the OAuth protocols. 

Future directions include performance improvements (e.g., exploiting the fact 
that our generalization-based algorithm is easily parallelizable), combining our 
generalization-based synthesis method with a counter-example guided approach, and 
application of our synthesis approach to other domains beside security (e.g., platform- 
based design and embedded systems [35]). 
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Abstract. The unrealizability of a specification is often due to the 
assumption that the behavior of the environment is unrestricted. In this 
paper, we present algorithms for synthesis in bounded environments, 
where the environment can only generate input sequences that are ulti- 
mately periodic words (lassos) with finite representations of bounded 
size. We provide automata-theoretic and symbolic approaches for solv- 
ing this synthesis problem, and also study the synthesis of approximative 
implementations from unrealizable specifications. Such implementations 
may violate the specification in general, but are guaranteed to satisfy the 
specification on at least a specified portion of the bounded-size lassos. 
We evaluate the algorithms on different arbiter specifications. 


1 Introduction 


The objective of reactive synthesis is to automatically construct an implementa- 
tion of a reactive system from a high-level specification of its desired behaviour. 
While this idea holds a great promise, applying synthesis in practice often faces 
significant challenges. One of the main hurdles is that the system designer has to 
provide the right formal specification, which is often a difficult task [12]. In par- 
ticular, since the system being synthesized is required to satisfy its requirements 
against all possible environments allowed by the specification, accurately cap- 
turing the designer’s knowledge about the environment in which the system will 
execute is crucial for being able to successfully synthesize an implementation. 
Traditionally, environment assumptions are included in the specification, usu- 
ally given as a temporal logic formula. There are, however less explored ways 
of incorporating information about the environment, one of which is to con- 
sider a bound on the size of the environment, that is, a bound on the size of 
the state space of a transition system that describes the possible environment 
behaviours. Restricting the space of possible environments can render an unre- 
alizable specification into a realizable one. The temporal synthesis under such 
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bounded environments was first studied in [6], where the authors extensively 
study the problem, in several versions, from the complexity-theoretic point of 
view. 

In this paper, we follow a similar avenue of providing environment assump- 
tions. However, instead of bounding the size of the state space of the environ- 
ment, we associate a bound with the sequences of values of input signals produced 
by the environment. The infinite input sequences produced by a finite-state envi- 
ronment which interacts with a finite state system are ultimately periodic, and 
thus, each such infinite sequence o € XY, over the input alphabet Xr, can be 
represented as a lasso, which is a pair (u,v) of finite words u € X} and v € XF, 
such that o = u-v”. It is the length of such sequences that we consider a bound 
on. More precisely, given a bound k € N, we consider the language of all infinite 
sequences of inputs that can be represented by a lasso (u,v) with |u- v| = k. 
The goal of the synthesis of lasso precise implementations is then to synthesize a 
system for which each execution resulting from a sequence of environment inputs 
in that language, satisfies a given linear temporal specification. 

As an example, consider an arbiter serving two client processes. Each client 
issues a request when it wants to access a shared resource, and keeps the request 
signal up until it is done using the resource. The goal of the arbiter is to ensure 
the classical mutual exclusion property, by not granting access to the two clients 
simultaneously. The arbiter has to also ensure that each client request is even- 
tually granted. This, however, is difficult since, first, a client might gain access 
to the resource and never lower the request signal, and second, the arbiter is not 
allowed to take away a grant unless the request has been set to false, or the client 
never sets the request to false in the future (the client has become unrespon- 
sive). The last two requirements together make the specification unrealizable, as 
the arbiter has no way of determining if a client has become unresponsive, or 
will lower the request signal in the future. If, however, the length of the lassos 
of the input sequences is bounded, then, after a sufficient number of steps, the 
arbiter can assume that if the request has not been set to false, then it will not 
be lowered in the future either, as the sequence of inputs must already have 
run at least once through it’s period that will be ultimately repeated from that 
point on. 

Formally, we can express the requirements on the arbiter in Linear Temporal 
Logic (LTL) as follows. There is one input variable r; (for request) and one output 
variable g; (for grant) associated with each client. The specification is then given 
as the conjunction Y = Ymutex N Presp ^ Pret Where we use the LTL operators 
Next O, Globally 0 and Eventually } to define the requirements 


Pmutexr = O-7(91 A 92), 
Yresp = ON (ri > OM); 
A 


Prel =U 
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Due to the requirement to not revoke grants stated in Yre}, the specification 
y is unrealizable (that is, there exists no implementation for the arbiter process). 
For any bound k on the length of the input lassos, however, ọ is realizable. More 
precisely, there exists an implementation in which once client i has not lowered 
the request signal for k consecutive steps, the variable g; is set to false. 

This example shows that when the system designer has knowledge about 
the resources available to the environment processes, taking this knowledge 
into account can enable us to synthesize a system that is correct under this 
assumption. 

In this paper we formally define the synthesis problem for lasso-precise imple- 
mentations, that is, implementations that are correct for input lassos of bounded 
size, and describe an automata-theoretic approach to this synthesis problem. We 
also consider the synthesis of lasso-precise implementations of bounded size, and 
provide a symbolic synthesis algorithm based on quantified Boolean satisfiability. 

Bounding the size of the input lassos can render some unrealizable specifica- 
tions realizable, but, similarly to bounding the size of the environment, comes 
at the price of higher computational complexity. To alleviate this problem, we 
further study the synthesis of approximate implementations, where we relax the 
synthesis problem further, and only require that for a given e > 0 the ratio 
of input lassos of a given size for which the specification is satisfied, to the 
total number of input lassos of that size is at least 1 — e. We then propose an 
approximate synthesis method based on maximum model counting for Boolean 
formulas [5]. The benefits of the approximate approach are two-fold. Firstly, it 
can often deliver high-quality approximate solutions more efficiently than the 
lasso-precise synthesis method, and secondly, even when the specification is still 
unrealizable for a given lasso bound, we might be able to synthesize an imple- 
mentation that is correct for a given fraction of the possible input lassos. 

The rest of the paper is organized as follows. In Sect. 2 we discuss related work 
on environment assumptions in synthesis. In Sect.3 we provide preliminaries 
on linear temporal properties and omega-automata. In Sect.3 we define the 
synthesis problem for lasso-precise implementations, and describe an automata- 
theoretic synthesis algorithm. In Sect.5 we study the synthesis of lasso-precise 
implementations of bounded size, and provide a reduction to quantified Boolean 
satisfiability. In Sect.6 we define the approximate version of the problem, and 
give a synthesis procedure based on maximum model counting. Finally, in Sect. 7 
we present experimental results, and conclude in Sect. 8. 


2 Related Work 


Providing good-quality environment specifications (typically in the form of 
assumptions on the allowed behaviours of the environment) is crucial for the syn- 
thesis of implementations from high-level specifications. Formal specifications, 
and thus also environment assumptions, are often hard to get right, and have 
been identified as one of the bottlenecks in formal methods and autonomy [12]. 
It is therefore not surprising, that there is a plethora of approaches addressing 
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the problem of how to revise inadequate environment assumptions in the cases 
when these are the cause of unrealizability of the system requirements. 

Most approaches in this direction build upon the idea of analyzing the cause 
of unrealizability of the specification and extracting assumptions that help elim- 
inate this cause. The method proposed in [2] uses the game graph that is used 
to answer the realizability question in order to construct a Biichi automaton 
representing a minimal assumption that makes the specification realizable. The 
authors of [8] provide an alternative approach where the environment assump- 
tions are gradually strengthened based on counterstrategies for the environment. 
The key ingredient for this approach is using a library of specification tem- 
plates and user scenarios for the mining of assumptions, in order to generate 
good-quality assumptions. A similar approach is used in [1], where, however, 
assumption patterns are synthesized directly from the counterstrategy without 
the need for the user to provide patterns. A different line of work focuses on 
giving feedback to the user or specification designer about the reason for unre- 
alizability, so that they can, if possible, revise the specification accordingly. The 
key challenge adressed there lies in providing easy-to-understand feedback to 
users, which relies on finding a minimal cause for why the requirements are not 
achievable and generating a natural language explanation of this cause [11]. 

In the above mentioned approaches, assumptions are provided or constructed 
in the form of a temporal logic formula or an omega-automaton. Thus, it is on the 
one hand often difficult for specification designers to specify the right assump- 
tions, and on the other hand special care has to be taken by the assumption 
generation procedures to ensure that the constructed assumptions are simple 
enough for the user to understand and evaluate. The work [6] takes a differ- 
ent route, by making assumptions about the size of the environment. That is, 
including as an additional parameter to the synthesis problem a bound on the 
state space of the environment. Similarly to temporal logic assumptions, this 
relaxation of the synthesis problem can render unrealizable specifications into 
realizable ones. From the system designer point of view, however, it might be sig- 
nificantly easier to estimate the size of environments that are feasible in practice 
than to express the implications of this additional information in a temporal logic 
formula. In this paper we take a similar route to [6], and consider a bound on the 
cyclic structures in the environment’s behaviour. Thus, the closest to our work is 
the temporal synthesis for bounded environments studied in [6]. In fact, we show 
that the synthesis problem for lasso-precise implementations and the synthesis 
problem under bounded environments can be reduced to each other. However, 
while the focus in [6] is on the computational complexity of the bounded syn- 
thesis problems, here we provide both automata-theoretic, as well as symbolic 
approaches for solving the synthesis problem for environments with bounded 
lassos. We further consider an approximate version of this synthesis problem. 
The benefits of using approximation are two-fold. Firstly, as shown in [6], while 
bounding the environment can make some specifications realizable, this comes 
at a high computational complexity price. In this case, approximation might 
be able to provide solutions of sufficient quality more efficiently. Furthermore, 
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even after bounding the environment’s input behaviours, the specification might 
still remain unrealizable, in which case we would like to satisfy the requirements 
for as many input lassos as possible. In that sense, we get closer to synthesis 
methods for probabilistic temporal properties in probabilistic environments [7]. 
However, we consider non-probabilistic environments (i.e., all possible inputs are 
equally likely), and provide probabilistic guarantees with desired confidence by 
employing maximum model counting techniques. Maximum model counting has 
previously been used for the synthesis of approximate non-reactive programs [5]. 
Here, on the other hand we are concerned with the synthesis of reactive systems 
from temporal specifications. 

Bounding the size of the synthesized system implementation is a complemen- 
tary restriction of the synthesis problem, which has attracted a lot of attention 
in recent years [4]. The computational complexity of the synthesis problem when 
both the system’s and the environment’s size is bounded has been studied in [6]. 
In this paper we provide a symbolic synthesis procedure for bounded synthesis 
of lasso-precise implementations based on quantified Boolean satisfiability. 


3 Preliminaries 


We now recall definitions and notation from formal languages and automata, 
and notions from reactive synthesis such as implementation and environment. 


Linear-Time Properties and Lassos. A linear-time property y over an alphabet X 
is a set of infinite words y C ©”. Elements of y are called models of p. A lasso 
of length k over an alphabet X is a pair (u,v) of finite words u € X* and v € Xt 
with ju- v| = k that induces the ultimately periodic word u-v”. We call u- v 
the base of the lasso or ultimately periodic word, and k the length of the lasso. 

If a word w € X” is a prefix of a word o € ©* UX”, we write w < ø. For a 
language L C X* U XY, we define Prefix(L) = {w € &* | do € L: w < o} is the 
set of all finite words that are prefixes of words in L. 


Implementations. We represent implementations as labeled transition systems. 
Let I and O be finite sets of input and output atomic propositions respectively. A 
2°-labeled 2/-transition system is a tuple T = (T, to, 7,0), consisting of a finite 
set of states T, an initial state to € T, a transition function 7: T x 2! — T, and 
a labeling function o: T — 2°. We denote by |7] the size of an implementation 
T, defined as |T| = |T|. A path in T is a sequence 7: N — T x 2! of states and 
inputs that follows the transition function, i.e., for all i € N if m(i) = (ti, ei) and 
mit1) = (ti+1,€i+1), then ti41 = T(ti,e;). We call a path initial if it starts 
with the initial state: 7(0) = (to,e) for some e € 2/. For an initial path 7, we 
call the sequence cp: i +> (o(t;) U e;) E (24¥°)” the trace of m. We call the set 
of traces of a transition system T the language of T, denoted L(T). 
Finite-state environments can be represented as labelled transition systems 
in a similar way, with the difference that the inputs are the outputs of the 
implementation, and the states of the environment are labelled with inputs for 
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the implementation. More precisely, a finite-state environment is a 2/-labeled 
2°-transition system € = (E, so, p, t). The composition of an implementation T 
and an environment € results in a set of traces of 7, which we denote Le(T), 
where o = 0901... E€ Le(T) if and only if o € L(T) and there exists an initial 
path sos; ...in E such that for alli € N, si41 = p(8;, 0141NO) and c;NT = 1(s;). 


Linear-Time Temporal Logic. We specify properties of reactive systems (imple- 
mentations) as formulas in Linear-time Temporal Logic (LTL) [9]. We consider 
the usual temporal operators Next O, Until U, and the derived operators Release 
R, which is the dual operator of U, Eventually © and Globally O. LTL formulas 
are defined over a set of atomic propositions AP. We denote the satisfaction of 
an LTL formula y by an infinite sequence a € (24")” of valuations of the atomic 
propositions by o — y and call ø a model of y. For an LTL formula y we define 
the language L(y) of y to be the set {o € (247)¥ | o Ky}. 

For a set of atomic propositions AP = O U I, we say that a 29-labeled 2/- 
transition system T satisfies an LTL formula y, if and only if L(T) C L(y), i.e., 
every trace of T satisfies y. In this case we call 7 a model of p, denoted T E y. 
If T satisfies y for an environment €, i.e. Le(T) C L(y), we write T He yp. 

For I C AP and a € (24”)* U (24”)”, we denote with o|; the projection of 
ao on I, obtained by the sequence of valuations of the propositions from I in ø. 


Automata Over Infinite Words. The automata-theoretic approach to reactive 
synthesis relies on the fact that an LTL specification can be translated to an 
automaton over infinite words, or, alternatively, that the specification can be 
provided directly as such an automaton. An alternating parity automaton over 
an alphabet X is a tuple A = (Q, qo, ô, p), where Q denotes a finite set of states, 
Qo C Q denotes a set of initial states, ô denotes a transition function, and 
u: Q — C C Nisa coloring function. The transition function ô : Qx X —> B*(Q) 
maps a state and an input letter to a positive Boolean combination of states [14]. 

A tree T over a set of directions D is a prefix-closed subset of D*. The empty 
sequence € is called the root. The children of a node n € T are the nodes {n-d € 
T |d € D}. A S-labeled tree is a pair (T,1), where l: T — X is the labeling 
function. A run of A = (Q, qo, ô, p) on an infinite word o = aoai- E€ XY isa 
Q-labeled tree (T,l) that satisfies the following constraints: (1) I(€) = qo, and 
(2) for all n € T, if I(n) = q, then {I(n’) | n’ is a child of n} satisfies 6(q, ajni). 

A run tree is accepting if every branch either hits a true transition or is an 
infinite branch noning--- € T, and the sequence l(no)l(ni)l(n2)... satisfies the 
parity condition, which requires that the highest color occurring infinitely often 
in the sequence ju(1(no))u(U(m1)) u(l(n2)) -+- E N® is even. An infinite word ø is 
accepted by an automaton A if there exists an accepting run of A on ø. The set 
of infinite words accepted by A is called its language, denoted L(A). 

A nondeterministic automaton is a special alternating automaton, where for 
all states q and input letters a, 6(q, œ) is a disjunction. An alternating automaton 
is called universal if, for all states q and input letters a, 6(q, a) is a conjunction. 
A universal and nondeterministic automaton is called deterministic. 
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A parity automaton is called a Biichi automaton if and only if the image of 
u is contained in {1,2}, a co-Btichi automaton if and only if the image of a is 
contained in {0,1}. Biichi and co-Biichi automata are denoted by (Q, Qo, ô, F), 
where F C Q denotes the states with the higher color. A run graph of a Biichi 
automaton is thus accepting if, on every infinite path, there are infinitely many 
visits to states in F; a run graph of a co-Biichi automaton is accepting if, on 
every path, there are only finitely many visits to states in F. 

The next theorem states the relation between LTL and alternating Büchi 
automata, namely that every LTL formula ọ can be translated to an alternating 
Biichi automaton with the same language and size linear in the length of ọ. 


Theorem 1. [13] For every LTL formula p there is an alternating Büchi 
automaton A of size O(|y|) with L(A) = L(y), where |p| is the length of vy. 


Automata Over Finite Words. We also use automata over finite words as accep- 
tors for languages consisting of prefixes of traces. A nondeterministic finite 
automaton over an alphabet X is a tuple A = (Q, Qo, ô, F), where Q and Qo C Q 
are again the states and initial states respectively, ô : Q x X — 2° is the tran- 
sition function and F is the set of accepting states. A run on a word a1... an is 
a sequence of states qoq1---Gn, Where qo E€ Qo and qi+ı € 6(q;,a;). The run is 
accepting if qn € F. Deterministic finite automata are defined similarly with the 
difference that there is a single initial state go, and that the transition function 
is of the form 6: Q x X > Q. As usual, we denote the set of words accepted by 
a nondeterministic or deterministic finite automaton A by L(A). 


4 Synthesis of Lasso-Precise Implementations 


In this section we first define the synthesis problem for environments producing 
input sequences representable as lassos of length bounded by a given number. 
We then provide an automata-theoretic algorithm for this synthesis problem. 


4.1 Lasso-Precise Implementations 


We begin by formally defining the language of sequences of input values repre- 
sentable by lassos of a given length k. For the rest of the section, we consider 
linear-time properties defined over a set of atomic propositions AP. The subset 
I C AP consists of the input atomic propositions controlled by the environment. 


Definition 1 (Bounded Model Languages). Let be a linear-time property 
over a set of atomic propositions AP, let X = 24”, and let I C AP. 

We say that an infinite word o € X” is an I-k-model of p, for a bound k € N, 
if and only if there are words u € (2')* and v € (2/)+ such that |u - v| = k and 
o|r =u-v”. The language of I-k-models of the property p is defined by the set 
Li (p) = {0 € X” | o is a I-k-model of p}. 
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Note that a model of y might be induced by lassos of different length and by 
more than one lasso of the same length, e.g, a” is induced by (a, a) and (€, aa). 
The next lemma establishes that if a model of y can be represented by a lasso 
of length k then it can also be represented by a lasso of any larger length. 


Lemma 1. For a linear-time property y over X = 24?, subset I C AP of 
atomic propositions, and bound k € N, we have Li(y) C LL (p) for all k' > k. 


Proof. Let o € Li(y). Then, o - ọ and there exists (u,v) € (2/)* x (2/)* such 
that |u- v| = k and oly = u-v”. Let v = v1... vg. Since w+ vi (v2... vgv)” = 
u- (v1... vk)” = o|r, we have o € Li (p). The claim follows by induction. 


Using the definition of [-k-models, the language of infinite sequences of envi- 
ronment inputs representable by lassos of length k can be expressed as L/(”). 


Definition 2 (k-lasso-precise Implementations). For a linear-time prop- 
erty p over X = 24”, subset I C AP of atomic propositions, and bound k € N, 
we say that a transition system T is a k-lasso-precise implementation of y, 


denoted T —x.,1 p, if it holds that Li (L(T)) C y. 


That is, in a k-lasso-precise implementation 7 all the traces of 7 that belong 
to the language L/(5”) are I-k-models of the specification y. 


Problem definition: Synthesis of Lasso-Precise Implementations 
Given a linear-time property y over atomic propositions AP with input atomic 
propositions J, and given a bound k € N, construct an implementation 7 such 
that T k, r p, or determine that such an implementation does not exist. 

Another way to bound the behaviour of the environment is to consider a 
bound on the size of its state space. The synthesis problem for bounded envi- 
ronments asks for a given linear temporal property y and a bound k € N to 
synthesize a transition system 7 such that for every possible environment € of 
size at most k, the transition system 7 satisfies y under environment €, i.e., 
T =E Y. 

We now establish the relationship between the synthesis of lasso-precise 
implementations and synthesis under bounded environments. Intuitively, the two 
synthesis problems can be reduced to each other since an environment of a given 
size, interacting with a given implementation, can only produce ultimately peri- 
odic sequences of inputs representable by lassos of length determined by the sizes 
of the environment and the implementation. This intuition is formalized in the 
following proposition, stating the connection between the two problems. 


Proposition 1. Given a specification p over a set of atomic propositions AP 
with subset I C AP of atomic propositions controlled by the environment, and a 
bound k € N, for every transition system T the following statements hold: 


(1) If T He ẹ for all environments E of size at most k, then T p,1 vy. 
(2) IfT Ekr p, then T He ọ for all environments E of size at most k. 
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Proof. For (1), let T be a transition system such that T Ke y for all environ- 
ments E of size at most k. Assume, for the sake of contradiction, that T /Fx,r y. 
Thus, that there exists a word o € L(T), such that o € Li(X”) anda K y. 

Since o € Li(X*), we can construct an environment € of size at most k that 
produces the sequence of inputs o|;. Since E is of size at most k, we have that 
T =e y. Thus, since o € Le(T), we have o = y, which is a contradiction. 

For (2), let T be a transition system such that T F,.)7),7 Y- Assume, for 
the sake of contradiction that there exists an environment € of size at most k 
such that T Ke y. Since T Ke y, there exists o E€ Le(T) such that o j vy. As 
the number of states of € is at most k, the input sequences it generates can be 
represented as lassos of size k- |7|. Thus, ø € Lir] (X*). This is a contradiction 


with the choice of T, according to which T Fx.) 7,1 9. 


4.2 Automata-Theoretic Synthesis of Lasso-Precise 
Implementations 


We now provide an automata-theoretic algorithm for the synthesis of lasso- 
precise implementations. The underlying idea of this approach is to first con- 
struct an automaton over finite traces that accepts all finite prefixes of traces in 
Li(S*”). Then, combining this automaton and an automaton representing the 
property y we can construct an automaton whose language is non-empty if and 
only if there exists an k-lasso-precise implementation of y. 

The next theorem presents the construction of a deterministic finite automa- 
ton for the language Prefix (Li (X°®)). 


Theorem 2. For any set AP of atomic propositions, subset I C AP, and bound 
k EN there is a deterministic finite automaton Ap over alphabet X = 24”, with 
size (2441 + 1)¥ - (k +1), such that L(A,) = {w € &* | 3o € L(Y). w < o}. 


Idea & Construction. For given k € N we first define an automaton Ak = 

(Q,q0,5, F) over £ = 2!, such that L(A,) = {0 € E* | 3G € LISY). @ < Gh. 

That, is L(A,) is the set of all finite prefixes of infinite words over £ that can 

be represented by a lasso of length k. We can then define the automaton Ax as 

the automaton that for each w € X* simulates A, on the projection wl, of w. 
We define the automaton A, = (Q, 40,0, F) such that 


- Q=(LU{#P* x {-,1,..., A}, 
= d= AEDk), 
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(w-a-#™1,t) ifg=(w-#™,t) where 1<m<k, 
we Stk-m) te{—,1,..., k} 


w, (i,,..-5% if q = (w, (41,.-., İk where w € 5, and 
1 k 


z ô(q, a) = = ij < k ^A w(ij) # a or ij=— 
j ij=k^w(ij) =a 
= F =Q\{(w,(—,...,—)) | w € X}. 


Proof. States of the form (w : a - #™,t) with m > 1 store the portion of the 
input word read so far, for input words of length smaller than k. In states of this 
form we have t = (1,2,..., k), which implies that all such states are accepting. 
In turn, this means that A; accepts all words of length smaller or equal to k. 
This is justified by the fact that, each word of length smaller or equal to k is a 
prefix of an infinite word in LLS”), obtained by repeating the prefix infinitely 
often. Now, let us consider words of length greater than k. 

In states of the form (u, (t1,...,%%)), with u € X*, the word u stores the 
first k letters of the input word. Intuitively, the tuple (i,,...,7,) stores the 
information about the loops that are still possible, given the portion of the 
input word that is read thus far. To see this, let us consider a word w € »* 
such that |w| = 1 > k, and let qoqi...q be the run of A, on w. The state qı 
is of the form q = (w(1)...w(k), (i{,.--,i1)). It can be shown by induction 
on | that for each j we have i Æ — if and only if w is of the form w = w’ - 
w” -w where w = w(1)...w(j — 1), w” = (w(j)...w(k))* for some k > 0, 
and w” = (w(j)...w(i — 1)). Thus, if i # —, then it is possible to have a 
loop starting at position j, and il is such that (w(j)... w(i} — 1)) is the prefix 
of w(j)...w(k) appearing after the (possibly empty) sequence of repetitions of 
w(j)...w(k). This means, that if i # —, then w is a prefix of the infinite word 
w - (w")” € ELS”). Therefore, if the run of A, on a word w with |w| > k is 
accepting, then there exists ø € LLS”) such that w < ø. 

For the other direction, suppose that for each j, we have il = —. Take any 
j, and consider the first position m in the run goqi...q where i? = —. By 
the definition of ô we have that w(m) 4 wir). This means that the prefix 
w(1)...w(m) cannot be extended to the word w(1)...w(j — 1)(w(g)... w(k))”. 
Since for every j € {1,..., k} we can find such a position m, it holds that there 
does not exist o € 11(S%) such that w < ø. This concludes the proof. 


The automaton constructed in the previous theorem has size which is expo- 
nential in the length of the lassos. In the next theorem we show that this expo- 
nential blow-up is unavoidable. That is, we show that every nondeterministic 
finite automaton for the language Prefix(L/(5“)) is of size at least 2°(*), 
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Theorem 3. For any bound k € N and sets of atomic propositions AP and) Æ 
I C AP, every nondeterministic finite automaton N over the alphabet X = 24? 
that recognizes L = {w € X* | do € Li (5). w < o} is of size at least 2°). 


Proof. Let N = (Q, Qo, ô, F) be a nondeterministic finite automaton for L. For 
each w € XF, we have that ww € L. Therefore, for each w € X* there exists 
at least one accepting run p = qoq1 ... gf of N on w: w. We denote with q(p, m) 
the state qm that appears at the position indexed m of a run p. 

Let a € 2! bea letter in 2’, and let ©’ = Xa’ € X | a'|r = a}. Let L’ C Lbe 
the language L’ = {w € X} | dw’ € (X")F-1 a’ E X: w=w' -a' and a'|r = a}. 
That is, L’ consists of the words of length k in which letters a’ with a’|; = a 
appear in the last position and only in the last position. 

Let us define the set of states 


Qk = {q(p, k) | dw € L’: pis an accepting run of N on w-wh. 


That is, Qk consists of the states that appear at position k on some accepting 
run on some word w w, where w is from L’. We will show that |Q,| > 2571. 

Assume that this does not hold, i.e., |Qz| < 2*71. Since |L’| > 2*~1, this 
implies that there exist w1, w2 E€ L’, such that wi|; # we|; and there exists 
accepting runs pı and p2 of N on wy: w and wə : we respectively, such that 
q(pi,k) = q(p2, k). That is, there must be two words in L’ with wi|; 4 walr, 
which have accepting runs on w : wı and w2- we visiting the same state at 
position k. 

We now construct a run p1,2 on the word w : w2 that follows pı for the 
first k steps on w1, ending in state q(p1, k), and from there on follows p2 on w2. 
It is easy to see that p12 is a run on the word wy; - w2. The run is accepting, 
since p2 is accepting. This means that w 1 -w2 € L, which we will show leads to 
contradiction. 

To see this, recall that wı = w{-a’ and wz = w5 : a”, and wi|; 4 w2|r, and 
a'|r = a" |; = a. Since w1 -w2 E€ L, we have that wi a’ - w5- a” < o for some 
o € L1(X”). That is, there exists a lasso for some word c, and w) - a’ - wh - a” is 
a prefix of this word. Since a does not appear in w|;, this means that the loop 
in this lasso is the whole word wı|z, which is not possible, since w| Æ wa|r. 

This is a contradiction, which shows that |Q| > |Q;,| > 2*~1. Since M was an 
arbitrary nondeterministic finite automaton for L, this implies that the minimal 
automaton for L has at least 2°) states, which concludes the proof. 


Using the automaton from Theorem 2, we can transform every property 
automaton A into an automaton that accepts words representable by lassos of 
length less than or equal to k if and only if they are in L(A), and accepts all 
words that are not representable by lassos of length less than or equal to k. 


Theorem 4. Let AP be a set of atomic propositions, and let I C AP. For every 
(deterministic, nondeterministic or alternating) parity automaton A over X = 
24P andk EN, there is a (deterministic, nondeterministic or alternating) parity 
automaton A’ of size 20%) . |A], s.t., L(A’) = (L1(2”)N L(A)) U (2 \ £1 (5%). 
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via The theorem is a consequence of Theorem 2 established as follows. Let 

= (Q, Qo, 6, u) be a parity automaton, and let D = (Q, G qo, 6. , F) be the deter- 
ets finite automaton for bound & defined as in Theorem 2. We define the 
parity automaton A = (Q’, Qo, 6’, u’) with the following components: 


- Q' = (Q x Q); 

— Qo = {(q0, qo) | qo E Qo} (when A is deterministic Qọ is a singleton set); 

- (4:0) a) = Alg a)y SG Where Al, a) agy is the Boolean 
expression obtained from 6(g,a) by replacing every state gq’ by the state 


/ qT); 
’ 


(d'a 
tt, ay JMO ETEF, 
w((9,@)) = j EGEF. 


Intuitively, the automaton A’ is constructed as the product of A and D, where 
runs entering a state in D that is not accepting in D are accepting in A’. To 
see this, recall from the construction in Theorem 2 that once D enters a state in 
Q \ F it remains in such a state forever. T hus, by setting the color of all states 
(q,¢) where ¢ Z F to 0, we ensure that words containing a prefix rejected by D 
have only runs in which the highest color appearing infinitely often is 0. Thus, 
we ensure that all words that are not representable by lassos of length less than 
or equal to k are accepted by A’, while words representable by lassos of length 
less than or equal to k are accepted if and only if they are in L(A). 


The following theorem is a consequence of the one above, and provides us with 
an automata-theoretic approach to solving the lasso-precise synthesis problem. 


Theorem 5 (Synthesis). Let AP be a set of atomic propositions, and I C 
AP be a subset of AP consisting of the atomic propositions controlled by the 
environment. For a specification, given as a deterministic parity automaton P 
over the alphabet X = 24”, and a bound k € N, finding an implementation T, 
such that, T =p r P can be done in time polynomial in the size of the automaton 
P and exponential in the bound k. 


5 Bounded Synthesis of Lasso-Precise Implementations 


For a specification y given as an LTL formula, a bound n on the size of the 
synthesized implementation and a bound k on the lassos of input sequences, 
bounded synthesis of lasso-precise implementations searches for an implementa- 
tion T of size n, such that T —,,7 p. Using the automata constructions in the 
previous section we can construct a universal co-Btichi automaton for the lan- 
guage Li (p) U(X” \ LA(2*”)) and construct the constraint system as presented 
in [4]. This constraint system is exponential in both |p| and k. In the following 
we show how the problem can be encoded as a quantified Boolean formula of 
size polynomial in |p| and k. 
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Theorem 6. For a specification given as an LTL formula p, and bounds k € N 
and n € N, there exists a quantified Boolean formula ¢, such that, ġ is satisfiable 
if and only if there is a transition system T = (T,to,7,0) of sizen with T Ex 7 y. 
The size of ġ is in O(|p| +n? +k?). The number of variables of ọ is equal to 
n: (n 2+ |O) +k-(Z]+)+n-k(\O]+n+1). 


Construction. We encode the bounded synthesis problem in the following quan- 
tified Boolean formula: 


1 
2 
3 


HTeiw | t,t E Lt Ee 21}, Ho: | te T,o € O}. ) 
) 
) 

4) 
) 
) 


Vie | ic, 0 <j <k}. {4 |0<j< k}. 
V{o; |o€ O,0< yg <n-k}. 

V{t; |teT,0<j<n-k}. 

V{l;, |O<j<n-k}. 


On 


( 
( 
( 
( 
( 
( 


Pdet A (Ylasso x perk a > [yt = D 6 


which we read as: there is a transition system (1), such that, for all input 
sequences representable by lassos of length k (2) the corresponding sequence 
of outputs of the system (3) satisfies y. The variables introduced in lines (4) and 
(5) are necessary to encode the corresponding output for the chosen input lasso. 
An assignment to the variables satisfies the formula in line (6), if it represents 
a deterministic transition system (aqet) in which lassos of length n-k (Prasconpm*) 
eT 
satisfy the property p (ey mek) )), These constraints are defined as follows. 
Paet: A transition system is deterministic if for each state t and input 7 there 


is exactly one transition 7; to some state t: A A V (tae A A Tai). 
teT ie?! eT tl At!" 


yer: for a certain input lasso of size k we can match a lasso in the system of 
size at most n-k. A lasso of this size in the transition system matches the input 
lasso if the following constraints are satisfied. 


A AG> Næ e oa )) (7) 


O<j<n-k tET o€O 
TAN too (8) 
AA A ACA be tag.nan) At = (mae etja) (9) 


O<j<n-k-1 ie2! tt'ET O0O<j'<k 
N \ (( VAN lj =y tA(n-k—1,k,j")) A tn-k-1 ~ (Tiit oO ( V li A t;))) 
E27 tt/ET O<j/<k 0<j<n-k 
(10) 


Lines (9) and (10) make sure that the chosen lasso follows the guessed transition 
relation 7. Line (10) handles the loop transition of the lasso, and makes sure 
that the loop of the lasso follows 7. Line (7) is a necessary requirement in order 
to match the output produced on the lasso with y. If the output variables oj 


satisfy the constraint [vl ee , then the lasso satisfies y. As the input lasso is 
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smaller than its matching lasso in the system we need to make sure that the 
indices of the input variables are correct with respect to the chosen loop. This 
is computed using the function A which is given by: 


A(,k, j’) = P a i 
((j— k) mod (k— jĵ')) +j’ otherwise. 

Ylasso: The formula encodes the additional constraint that exactly one of the 
loop variables can be true for a given variable valuation. 

lol "™: This constraint encodes the satisfaction of y on lassos of size m. The 
encoding is similar to the encoding of bounded model checking [3], with the dis- 
tinction of encoding the satisfaction relation of the atomic propositions, given 
below. As the inputs run with different indices than the outputs, we again, 
as in the lines (9) and (10), need to compute the correct indices using the 
function A. 


h<m h=m 

[il A (ly => taney) Vo GA A (ly > taG.n99)) 
0<j’/<k O0<j’/<k 

Pig” J a — tans) | Vico (G^ (aN Ly > IAG,b,3/))) 
<j'< si's 

loli” | on foo (N08) 

kym m1 
[ro], | ten Vino (lj A703) 


6 Synthesis of Approximate Implementations 


In some cases, specifications remain unrealizable even when considered under 
bounded environments. Nevertheless, one might still be able to construct imple- 
mentations that satisfy the specification in almost all input sequences of the 
environment. Consider for example the following simplified arbiter specification: 


(w > 09) AL(r > Og) 


The specification defines an arbiter that should give grants g upon requests 
r, but is not allowed to provide these grants unless a signal w is true. The 
specification is unrealizable, because a sequence of inputs where the signal w 
is always false prevents the arbiter from answering any request. Bounding the 
environment does not help in this case as a lasso of size 1 already suffices to 
violate the specification (the one where w is always false). Nevertheless, one can 
still find reasonable implementations that satisfy the specification for a large 
fraction of input sequences. In particular, the fraction of input sequences where 
w remains false forever is less probable. 
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Definition 3 (e-k-Approximation). For a specification p, a bound k, and an 
error rate €, we say that a transition system T approximately satisfies y with an 
error rate € for lassos of length at most k, denoted by T —{, p, if and only if, 
oloe Li (L(T)),cKe} 

men) 2! 
Theorem 7. For a specification given as a deterministic parity automaton P, a 
bound k and a error rate0 < e < 1, checking whether there is an implementation 
T, such that, T =$, ; P can be done in time polynomial in |P| and exponential 
in k. 


c. We call T an €-k-approximation of ọ. 


Proof. For a given € and k, we construct a nondeterministic parity tree automa- 
ton N that accepts all e-k-approximations with respect to L(P). For e, we can 
compute the minimal number m of lassos from L{((2/)”) for which an e-k- 
approximation has to satisfy the specification. In its initial state, the automaton 
N guesses m many lassos and accepts a transition system if it does not violate 
the specification on any of these lassos. The latter check is done by following 
the structure of the automaton constructed for P using Theorem 4. In order to 
check whether there is an e-k-approximation for P, we solve the emptiness game 
of N. The size of N is (2*)™+1 . |P]. 


6.1 Symbolic Approach 


In the following, we present a symbolic approach for finding ¢-k-approximations 
based on maximum model counting. We show that we can build a constraint 
system and apply a maximum model counting algorithm to compute a transition 
system that satisfies a specification for a maximum number of input sequences. 


Definition 4 (Maximum Model Counting [5]). Let X,Y and Z be sets of 
propositional variables and @ be a formula over X,Y and Z. Let x denote an 
assignment to X, y an assignment to Y, and z an assignment to Z. The maz- 
imum model counting problem for @ over X and Y is computing a solution for 
max #Y.4z.0(2, y, z). 


For a specification y, bounds k and n on the length of the lassos and size of 
the system, respectively, we can compute an e-k-approximation for y by applying 
a maximum model counting algorithm to the constraint system given below. It 
encodes transition systems of size n that have an input lasso of length k that 
satisfies ọ. 


{Tiit | t,t E T,i € 27}. Ho: | te T,o E O}. 
{i lic I, 0 <j <k} Hl; |O< 5 < k} 
{zi |e el,0<i,j <k} 


(11) 
(12) 
(13) 
{oj |0€ O,0<j<n-k}. (14) 
(15) 
(16) 
(17) 


{t; |tET,0<j <n: k}. 
{G0 <j <n: k}. 
Paet A Piasso A PEF A lolo” A [klo 
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To check the existence of a ¢-k-approximation, we maximize over the set of 
assignment to variables that define the transition system (line 11) and count 
over variables that define input sequences of the environment given by lassos of 
length k. As two input lassos of the same length may induce the same infinite 
input sequence, we count over auxiliary variables that represent unrollings of the 
lassos instead of counting over the input propositions themselves (line 13). 

The formulas Ydet; Ylasso: gn and Jelg are defined as in the previous 
section. The formula [kj] is defined over that variables in line (13) and makes 
sure that input lasso that represent the same infinite sequence are not counted 
twice by unrolling the lasso to size 2k. 


Theorem 8. For a specification given as an LTL formula p, and bounds k and 
n, and an error rate €, the propositional formula ¢ defined above is of size O(|y|- 
n?+k?). The number of variables of ¢ is equal to n-(n- 24! +|O|) +k- (k- || 
[Z| +1)+n-k(jO]|+n+4+1). 


7 Experimental Results 


We implemented the symbolic encodings for the exact and approximate synthesis 
methods, and evaluated our approach on a bounded version of the greedy arbiter 
specification given in Sect. 1, and another specification of a round-robin arbiter. 
The round-robin arbiter is defined by the specification: 


OoOw - 007 ADO g2 AO(-w > Olgi A 792)) A Og: V 792) 


This specification is realizable, with transition systems of size at least 4. We used 
our implementation to check whether we can find approximative solutions with 
smaller sizes. We used the tool CAQE [10] for solving the QBF instances and 
the tool MaxCount [5] for solving the approximate synthesis instances. 


Table 1. Experimental results for the symbolic approaches. The rate in the approxi- 
mate approach is the rate of input lassos on which the specification is satisfied. 


Instance QBF MaxCount 
Spec. Proc.|#States Bound|Result |#Gates|V |3 |Time #Max|#Count/|Rate|Time 
Round- 2 2 4 Unreal|/15556 |48 |12|/9.91s 12 8 0.5 |26s 
Robin 
Arbiter 
2 3 2 Unreal) 5338 40|24|2.45s 24 4 0.88 |161 s 
2 4 2 Real |13414 60|12|12.15s 40 4 0.88 |283 s 
Greedy Arbiter|1 2 2 Real |1597 20|10/0.41 s 10 4 1.0 |0.79s 
1 2 3 Unreal|4749 30|10|1.95s 10 6 0.88 |3.86 s 
$ 3 3 Unreal|16861 48|21|17.26s 21 6 0.88 |20.83 s 
$ 4 3 Real |43692 78/36 3 min 7.44s|36 6 1.0 |2min 43s 
1 4 4 - 169829 |104|36 TO 36 8 - TO 
2 4 2 Real |24688 62|72|1 min. 24s |72 6 - TO 
2 4 3 Unreal} 103433 | 93/72/27 min 15.2/72 12 - TO 
3 2 2 Unreal) 3985 93|72|1.39 s 38 8 0.65 |4.18s 
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The results are presented in Table1. As usual in synthesis, the size of the 
instances grows quickly as the size bound and number of processes increase. 
Inspecting the encoding constraints shows that the constraint for the specifica- 
tion is responsible for more than 80% of the number of gates in the encoding. The 
results show that, using the approach we proposed, we can synthesize implemen- 
tations for unrealizable specifications by bounding the environment. The results 
for the approximate synthesis method further demonstrate that for the unreal- 
izable cases one can still obtain approximative implementations that satisfy the 
specification on a large number of input sequences. 


8 Conclusion 


In many cases, the unrealizability of a specification is due to the assumption 
that the environment has unlimited power in producing inputs to the system. 
In this paper, we have investigated the problem of synthesizing implementations 
under bounded environment behaviors. We have presented algorithms for solv- 
ing the synthesis problem for bounded lassos and the synthesis of approximate 
implementations that satisfy the specification up to a certain rate. 

We have also provided polynomial encodings of the problems into quantified 
Boolean formulas and maximum model counting instances. Our experiments 
demonstrate the principal feasibility of the approach. Our experiments also show 
that the instances can quickly become large. While this is a common phenomenon 
for synthesis, there clearly is a lot of room for optimization and experimentation 
with both the solvers for quantified Boolean expressions and for maximum model 
counting. 
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Abstract. Programs with arrays are ubiquitous. Automated reasoning 
about arrays necessitates discovering properties about ranges of elements 
at certain program points. Such properties are formally specified by uni- 
versally quantified formulas, which are difficult to find, and difficult to 
prove inductive. In this paper, we propose an algorithm based on an enu- 
merative search that discovers quantified invariants in stages. First, by 
exploiting the program syntax, it identifies ranges of elements accessed 
in each loop. Second, it identifies potentially useful facts about individ- 
ual elements and generalizes them to hypotheses about entire ranges. 
Finally, by applying recent advances of SMT solving, the algorithm fil- 
ters out wrong hypotheses. The combination of properties is often enough 
to prove that the program meets a safety specification. The algorithm 
has been implemented in a solver for Constrained Horn Clauses, FREQ- 
HORN, and extended to deal with multiple (possibly nested) loops. We 
show that FREQHORN advances state-of-the-art on a wide range of public 
array-handling programs. 


1 Introduction 


Formally verifying programs against safety specifications is difficult. This prob- 
lem worsens in the presence of data structures like lists, arrays, and maps, which 
are ubiquitous in real-world applications. For instance, proving an array-handling 
program safe often requires discovering an inductive invariant that is univer- 
sally quantified over ranges of array elements. Such invariants help to prove the 
unreachability of error states independently of the size of the array. However, the 
majority of invariant synthesis approaches are limited to quantifier-free numer- 
ical invariants. The approach presented in this paper advances the knowledge 
by an effective technique to discover quantified invariants over arrays and linear 
integer arithmetic. 

Syntax-guided techniques [3] have recently been applied to synthesize 
quantifier-free numerical invariants [15-17,34] in the approach called FREQ- 
Horwn. In a nutshell, FREQHORN collects various statistics from the syntactical 
patterns occurring in the program’s source code and uses them to construct a 


© The Author(s) 2019 
I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11561, pp. 259-277, 2019. 
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set of formal grammars that specify a search space for invariants. It is often suf- 
ficient to perform an enumerative search over the formulas produced from these 
grammars and identify a set of suitable inductive invariants among them using 
an off-the-shelf solver for Satisfiability Modulo Theories (SMT). The presence 
of arrays complicates this reasoning in a few respects: it is hard to find suitable 
candidates and difficult to prove them inductive. 

In this paper, we present a novel technique that extends the approach of 
enumerative search in general, and its instantiation in FREQHORN in particular, 
to reason about quantifiers. It discovers invariants over arrays in multiple stages. 
First, by exploiting the program syntax, it identifies ranges of elements accessed 
in each loop. Second, it identifies potentially useful facts about individual ele- 
ments and generalizes them to hypotheses about entire ranges. The SMT-based 
validation of candidates, which are quantified formulas, is often inexpensive as 
they are constructed using the same syntactic patterns that appear in the source 
code. Furthermore, for supporting certain corner cases, our approach allows spec- 
ifying additional rules that help in generalizing learned properties. The combi- 
nation of properties proven inductive by an SMT solver is often enough to prove 
that the program meets a safety specification. 

We show that FREQHORN advances state-of-the-art on a selection of array- 
handling programs from SVCOMP! and literature. For instance, it can prove 
completely automatically that an array is monotone after applying a sorting 
algorithm. Furthermore, FREQHORN is able to discover quantifier-free invari- 
ants over integer variables in the program, use them as inductive relatives while 
checking inductiveness of quantified candidates over arrays; and vice versa. 

While a detailed discussion of the related work comes later in the paper 
(Sect. 6), it is noteworthy that being syntax-guided crucially helps us overcome 
several limitations of other techniques to verify array-handling programs [2,9, 
11,35]. Most of them avoid inferring quantified invariants explicitly and thus 
do not produce checkable proofs. As a result, tools are fragile and in practice 
often output false positives (see Sect.5 for concrete results). By comparison, 
our approach never produces false positives, and its results can be validated by 
existing SMT solvers. 

The core contributions made through this work are: 

— a novel syntax-guided approach to generate universally quantified invariants 
for programs manipulating arrays; 

— an algorithm and its fully automated implementation; and 

— a thorough experimental evaluation comparing our technique with state-of- 
the-art in verification of array-handling programs. 


The rest of the paper is structured as follows. In Sect.2, we give background 
and notation and illustrate our approach on an example. Our main contributions 
are then presented in Sect.3 (main algorithm) and Sect.4 (important design 
choices). In Sect.5, we show the evaluation and comparison with state-of-the- 
art. Finally, the related work and conclusion complete the paper in Sects. 6 and 
7, respectively. 


1 Software Verification Competition, http://sv-comp.sosy-lab.org/. 
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2 Background 


The Satisfiability Modulo Theories (SMT) task is to decide whether there is 
an assignment m of values to variables in a first-order logic formula y that 
makes it true. We write y = > y, if every satisfying assignment to ọ is also 
a satisfying assignment to some formula w. By Expr we denote the space of all 
possible quantifier-free formulas in our background theory and by Vars a range 
of possible variables. 


2.1 Programs as Constrained Horn Clauses 


To guarantee expected behaviors, programs require proofs, such as inductive 
invariants, ranking functions, or recurrence sets. It is becoming increasingly pop- 
ular to consider a verification task as a proof synthesis task which is formulated 
as a system of SMT formulas involving unknown predicates, also known as con- 
strained Horn clauses (CHC). The synthesis goal is to discover a suitable inter- 
pretation of all unknown predicates that make all CHCs true. CHCs offer the 
advantages of flexibility and modularity in designing verifiers for various systems 
and languages. CHCs can be constructed in a way that captures the operational 
semantics of a language in question, and an off-the-shelf CHC solver can be used 
for solving the resulting formulas. 


Definition 1. A linear constrained Horn clause (CHC) over a set of uninter- 
preted relation symbols R, is a formula in first-order logic that has the form of 
one of three implications (called respectively a fact, an inductive clause, and a 
query): 


olti) = invi(di) 
inv, (21) A plti, 42) => inva(dh) 
inv, (41) A g(a) => L 


where invi, inv, E€ R are uninterpreted symbols, £i, £> are vectors of variables, 
and ip, called a body, is a fully interpreted formula (i.e., p does not have appli- 
cations of inv, or invo). 


For a CHC C, by src(C) we denote an application of inv € R in the premise 
of C (if C is a fact, we write src(C) = T). Similarly, by dst(C) we denote 
an application of inv € R, in the conclusion of C (if C is a query, we write 
dst(C) = 1). We define functions rel and args, such that for each inv(z), 
rel(inu(Z)) = inv and args(inv(Z)) = z. For a CHC C, by body(C) we denote 
the body (i.e., y) of C. 


Example 1. Figure 1 gives a program in the C programming language that han- 
dles two integer arrays, A and B, both of an unknown size N. The A array has 
unknown content, and the program first identifies a value m which is smaller or 
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int N = nondetInt(); 
int *A = nondetArray (N); 


int m= 0; 
for (int i = Ñ - 1; i > 0; i--) { if (m > A[i]) m = Ali]; } 
int *B = malloc(N*sizeof(int)); 


for (int i = 0; i < N; i++) { BIN - i - 1] = ATi] - m; } 
int: $ = 03 

for (int i = 0; i < N; i++) {s = s + B[i]; } 

assert(s > 0); 


Fig. 1. Example program: source code in C. 


(A) i’ =N'-1Am'=0 => inv, (A’,i',m’,N’) 

(B) inv, (A,izm,N)Ai>0Am'=ite(m> Ali], Afi], m) A’ =i-1 => inv) (A,i’,m’,N) 

(C) inv (A,i,m,N)Ai<O0Ai’ =0 => inv2(A,B,i’,m,N) 

(D) inv2(A,B,i,m,N)Ai<NAB' =store(B,N—i-1,A[i]—m])Ai’=i+1 inv2(A,B’,i’,m,N) 
(E) inv2(A,B, i,m, N) ^i > NAi =0As8' =0 => invs(A,B,i’,m,s’,N) 

(F) inv3(A,B,i,m,s,N)Ai< NAs’=s+B[i]Ai =i+1 => inv3(A,B,i’,m,s’,N) 

(G) inv3(A,B,i,m,s,N)Ai>NAs<0 => L 


Fig. 2. Example program: CHC encoding. 


equal to all elements of A (it might be either a minimal element among the con- 
tent of A or 0). Then, the program populates B by values of A with m subtracted. 
Interestingly, the order of elements A and B is not preserved, e.g., ALO] - m gets 
written to BIN - 1], and so on. Finally, the program computes the sum s of all 
elements in B and requires us to prove that s is never negative. 

Figure 2 gives a CHC encoding of the program. The system has three uninter- 
preted predicates, inv, inv, and inv corresponding to invariants at heads of 
the three loops. The primed variables correspond to modified variables. Rules B, 
D, and F encode the loop bodies, and the remaining rules encode the fragments 
of code before, after, or between the loops. In particular, rule G ensures that 
after the third loop has terminated, a program state with a negative value of s 
is unreachable. Before we describe how our technique solves this CHC system 
(see Sect. 2.2), we briefly introduce the notion of satisfiability of CHCs. 


Definition 2. Given a set of uninterpreted relation symbols R and a set S of 
CHCs over R, we say that S is satisfiable if there exists an interpretation that 
assigns to each n-ary symbol inv © R, a relation over n-tuples and makes all 
implications in S valid. 


In the paper, we assume that a relation assigned by an interpretation is 
represented by a formula w over at most n free variables. 

We call a CHC C inductive when rel(src(C)) = rel(dst(C)) = inv for some 
inv. While accessing an array in a loop, we assume the existence of an integer 
counter variable. More formally: 
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Definition 3. Let C be an inductive CHC, € = args(src(C)), and x = 
args(dst(C’)). We say that C is array-handling if there exist numbers c and 
a, such that (1) 1 < c < |Z| and 1 < a < ||; (2) Zc] (and consequently, its 


2 I: 


“primed copy” Z'|c|) has type integer, (3) either of these implications holds: 
body(C) => Z[c] < zic] (1) 


body(C) => Z[c] > zic] (2) 


(4) zla] (and consequently z’|a]) has type array, and (5) there is an access func- 
tion f that identifies a relationship between an access to £a] in body(C) and 


=| 


zie]. 


2.2 Illustrating Example 


The CHC system in Fig. 2 has a solution, indicating that the program meets its 
specification. In particular: 


invi > Vj.i< j <N = m < Ajj] 

inv =>Yj.0<j <N = m < A|j]^ 
Yj.0<j<i = BİN -j -1]= Ali] -m 

inv3 =Yj.0<j <N = m< Aļj]^ 
Vj.0<j<N = BİN -j -1]= Ali] -m 
As>0 


The interpretation of inv, means that as the first loop progresses (i.e, all 

elements A[N — 1], A[N — 2], ..., Ali +1] are sequentially considered), the value 
of m is always smaller than all the considered elements. Thus, we refer to the 
interpretation of anv, as a progress lemma. When the first loop has terminated, 
clearly, this property holds for all elements from A[0] to A[N — 1]. Because A 
leaks through the second loop without any changes, the interpretation of inv, 
gets finalized (thus, it becomes a finalized lemma) and added to an interpretation 
of inve. 
Additionally, the interpretation of inv2 gets a relational fact about pairs of 
elements A[0] and B[N—1], A[1] and B[N—2], ..., A[i—1] and B[N—i—2], which 
again appears as a progress lemma and then gets finalized in an interpretation of 
inv3. With these two quantified invariants about all elements of A, and relation 
about pairs of elements of A and B, it is possible to derive the remaining lemma 
in the interpretation of inv3, namely, s > 0; which concludes the proof. 


3 Invariants via Enumerative Search 


In this work, we aim at discovering a solution for a CHC system S over a set 
of uninterpreted symbols  enumeratively, i.e., by guessing a candidate formula 
for each inv E R, substituting it for all CHCs C € S and checking their validity. 
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3.1 Quantifier-Free Invariants 


We build on top of an algorithm, called FREQHORN, recently proposed in [17]. Its 
key insight is an automatic construction of a set of formal grammars G(inv) for 
each inv E R based on either source code, program behaviors, or both. Impor- 
tantly, these grammars are conjunction-free: they cannot be used to produce 
a conjunction of clauses and can give rise to only a finite number of formulas, 
potentially related to invariants (otherwise, the approach does not guarantee 
strong convergence). Since invariants are often represented by a conjunction 
of lemmas, FREQHORN attempts to sample (i.e., recursively apply production 
rules) each lemma from a grammar in separation, until a combination of them is 
sufficient for the inductiveness and safety, or a search space is exhausted. FREQ- 
HORN relies on an SMT solver to filter out unsuccessfully sampled lemmas. 

The construction of formal grammars is biased by the syntax of CHC encod- 
ing. First, FREQHORN collects a set of Seeds by converting the body of each 
CHC to a Conjunctive Normal Form, extracting, and normalizing each conjunct. 
Then, the set of seeds could be optionally replenished by a set of behavioral seeds 
and bounded proofs. They are constructed respectively from the concrete values 
of variables obtained from actual program runs, and Craig interpolants from 
unsatisfiable finite unrollings of the CHC systems. Finally, the production rules 
are created in a way to enable producing seeds and also their mutants (i.e., 
syntactically similar formulas to seeds). In general, no specific restriction on 
a grammar-construction method is imposed; so in practice, the grammars are 
allowed to be more (or less) general to enable a broader (or more focused) search 
space for invariants. 


3.2 Quantified Candidates from Quantifier-Free Grammars 


The main obstacle for applying the enumerative search to generate array invari- 
ants is that the grammars do not allow quantifiers. Because grammars are con- 
structed automatically from syntactic patterns which appear in the original pro- 
grams, in the presence of arrays, we can expect expressions involving only par- 
ticular elements of arrays (such as ones accessed via a loop counter). However, 
since each loop repeats certain operations over a range of array elements, we have 
to generalize the extracted expressions about individual elements to expressions 
about entire ranges. 

Let a set of variables associated with a relation symbol inv be Vars(inv) = 
IntVars(inv) U ArrVars(inv), where IntVars(inv) and ArrVars(inv) are dis- 
joint and contain integer variables and array variables, respectively. A candidate 
quantified invariant over arrays consists of three parts: 


— a set of quantified integer variables QVars(inv), which are introduced by our 
algorithm and do not appear in Vars(inv); 

— a range formula over QVars(inv) U IntVars(inv); and 

— a quantifier-free cell property over QVars(inv) U Vars(inv). 
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Algorithm 1. PREPARE(S, R) 


Input: CHCs S over R, 
Output: Formal grammars G(inv), quantified variables QVars(inv) and 
progressRange(inv) for each inv E R, 


1 for each inv € R do 

2 Seeds — SYNTSEEDS(inv) U BEHAVSEEDS(inv); 

3 cnt — GETCOUNTERS(S, inv, ArrVars(inv)); 

4 if Ø Æ cnt then 

5 QVars(inv) — coPy(ent); 

6 progressRange(inv) — GETRANGE(cnt); 

7 G(inv) — REPLACE(GETGRAMMAR( Seeds), cnt, QVars(inv)); 


Algorithm 2. SOLVEARRAYCHCS(S, R) 


Input: CHCs S over R, 
Output: res € {SAT, UNKNOWN}, Lemmas : R, — 2%?" 


1 (G, QVars, progressRange) — PREPARE(S, R); 
2 for each inv € R, do Lemmas(inv) — g; 
3 while 3C € S. ( A L(args(src(C))) A body(C) =A 1) do 
£E Lemmas(rel(sre(C))) 
if Vinv € R. ALLBLOCKED(G(inv)) then return (UNKNOWN, Ø); 
inv — PICKLOOP(R); 
if QVars(inv) = Ø then Cand(inv) — SAMPLE(G(inv)); 
else Cand(inv) — VQVars(inv). 
QVars(inv) € progressRange(inv) => SAMPLE(G(inv)); 
ExtCand — EXTEND(S, {inv}, Cand, Lemmas); 
9 if Vinv’ € R. ExtCand(inv') = T then G(inv) — BLOCK(G, Cand, inv); 
10 else 


NO og 


11 for each inv’ € R do 
2 Lemmas(inv') — Lemmas(inv') U {ExtCand(inv')}; 
13 G(inv') — BLOCK(G, ExtCand, inv’); 


14 return (SAT, Lemmas); 


A naive idea for getting a range formula and a cell property is to sample 
them separately, and then to bind them together using some Q Vars (inv). But it 
would result in a large search space. Algorithm 1 gives a more tailored procedure 
on the matter. The central role in this process is taken by an analysis of the 
loop counters which are used to access array elements (line 3). This analysis is 
performed once for each loop before the main verification process, and thus its 
results are reused in all iterations of the verification process. 

Our algorithm identifies QVars(inv) by creating a fresh variable for each 
counter, including counters of nested loops (line 5). It then generates range 
formulas based on the results of the analysis (line 6) such that: (1) the range 
formula itself is an inductive invariant for inv, and (2) the range formula is 
expressed over the initial values of counters of inv and the counters themselves. 
Finally, only a cell property is going to be produced from the grammar G(inv), 
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Algorithm 3. WEAKEN(S’, 8’, Cand, Lemmas) 


Input: CHCs S’ over 8’, candidates Cand(inv); learned Lemmas(inv) for 
each inv € RK! 
Output: weakened Cand 


1 toRecheck — L; 
2 for all C € S’ do 
3 if A L(args(src(C))) A Cand(rel(srce(C)))(args(sre(C))) A 


LE Lemmas(rel(sre(C))) 


body(C) =& Cand(rel(dst(C)))(args(dst(C))) then 


if ISFINALIZEDARRAYCAND( Cand, rel(dst(C))) then 
Cand(rel(dst(C)))) — GETREGRESSCAND( Cand, rel(dst(C))); 

else 
Cand(rel(dst(C))) — T; 

toRecheck — T; 

9 break; 

10 if toRecheck then return WEAKEN(S", 8’, Cand, Lemmas); 

11 else return Cand; 


arta» A 


Algorithm 4. EXTEND(S, R, Cand, Lemmas), cf [17]. 


Input: CHCs S over R; R’ C R, candidates Cand(inv); learned Lemmas(inv) 
for each inv E€ R’ 
Output: extended Cand 
1 Cand — WEAKEN(S", R’, Cand, Lemmas); 
2 for all C € S s.t. rel(src(C)) € R’ do 
3 Cand(rel(dst(C))) — PROPAGATE(C, Cand); 
4 
5 


Cand — EXTEND(S, R' U {rel(dst(C))}, Cand, Lemmas); 
return Cand; 


constructed from the seeds (recall Sect. 3.1), in which all counters are replaced 
by the corresponding variables from QVars(inv) (line 7). Thus, the only part 
of the candidate formula where the counter can appear is the range formula. 
Once grammars, QVars, and ranges are detected, our approach proceeds to 
sample candidates and to check them with an SMT solver. The general flow of 
this algorithm is illustrated in Algorithm 2. For each inv E R, it initiates a set 
Lemmas(inv) (line 2). Then it iteratively guesses lemmas until a combination 
of them is inductive and safe, or a search space is exhausted (lines 3-4). 
Compared to the baseline approach from [17], our new algorithm fixes a 
shape for the candidates for arrays. At the same time, it permits to sample 
quantifier-free candidates (line 6): they could be either formulas over counters 
or any other variables in the loop, or even formulas over isolated array elements 
(if, e.g., accessed by a constant). Then (line 8), Algorithm 2 propagates can- 
didates through all available implications in CHCs using quantifier elimination 
and identifies lemmas among the candidates. This step is similar to the baseline 
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approach from [17], but for completeness of presentation, we provide the pseu- 
docode in Algorithms 3 and 4. The only differences are (1) in the implementation 
of the candidate propagation for array candidates and (2) in the weakening of 
failed candidates (both in Algorithm 3, to be discussed in Sects.4.3 and 4.4, 
respectively). 

Both successful and unsuccessful candidates are “blocked” from their gram- 
mars to avoid re-sampling them in the next iterations. This fact together with 
the property of grammars being conjunction-free gives the main hint for proving 
the following theorem. 


Theorem 1. Algorithm 2 always makes a finite number of iterations, and if it 
returns with SAT then the CHC system is satisfiable. 


Next section discusses a particular instantiation of important subroutines 
that make our invariant synthesizer effective in practice. 


4 Design Choices 


Our main contribution is a completely automated algorithm for finding quan- 
tified invariants for array-handling loops. In this section, we first show how by 
exploiting the program syntax we can identify ranges of elements accessed in each 
loop (Sect. 4.1). Second, we present an intuitive justification to why our candi- 
dates can often be proved as lemmas by an off-the-shelf SMT solver (Sect. 4.2). 
Finally, we extend our algorithm to handle more complicated cases of multi- 
ple loops (Sects. 4.3-4.4), and benchmarks of the tiling [9] technique, which are 
adapted from the industrial code of battery controllers (Sect. 4.5). 


4.1 Discovery of Progress Lemmas 


We start with the simplest scenario of a single loop handling just one array. 
Let S be a system of CHCs over a set of uninterpreted relation symbols R. Let 
inv © R correspond to a loop, in which arrays are accessed using some counter 
variable i (counters are automatically identified by posing and solving queries of 
forms (1) and (2)). 

Recall that we do not necessarily require the array elements to be accessed 
directly by i, and we allow an access function f to identify relationships between 
i and an index of the accessed element. However, we assume that the counter is 
unique in the loop because it is the case in most of the practical applications. In 
principle, our algorithm can be extended to loops handling several independent 
counters (although it is rare in practice), with the help of additionally discovered 
lemmas that describe relationships among counters. We leave a discussion about 
this to future work. 


Definition 4. A range of inv and a counter i is a formula over IntVars(inv) 
and a free variable v having form L < v Av < U, such that either of formulas 
L <iori< U is a lemma for inv. A progress lemma is either a formula 
L<vudv <i (ifL <i is alemma), ora formulai<vuAu<U (ifi<U isa 
lemma). 
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Both ranges and progress ranges can be identified statically. Let C1 and C2 
be two CHCs, such that inv = rel(dst(C1)) = rel(src(C2)) = rel(dst(C2z)) and 
inv + rel(src(C)). It is common in practice that body(C;) identifies a symbolic 
bound b on the initial value of i: it could be either a lower bound (if i increments 
in body(C2)) or an upper bound (if i decrements). In this case, a progress range 
of inv is simply computed as a lemma for inv over i and b. A range of inv can 
often be constructed as a conjunction of the progress range with the negation of 
the termination condition of body(C2).? 


Example 2. For the CHC-encoding of the program is shown in Fig. 2, the ranges 
of inv, inv and inv are all equal to —1 < v < N. The progress range of inv; 
is 1 < v < N, and the progress ranges of inv and inv3 are —1 < v < i. 


We call candidates, that use progress ranges in their left sides, progress can- 
didates: 
Yg. progressRange(inv)(q) => cand 


where q = QVars(inv) and cand is a quantifier-free formula over QVars(inv) U 
IntVars(inv). As can be seen from Algorithm 1, all sampled candidates are 
progress candidates. However, during the next steps of the algorithm (i.e., prop- 
agation and weakening) we will use other kind of candidates (namely, regress 
and finalized, see Sects. 4.3 and 4.4 respectively). 

If a progress candidate is proven inductive, we call it a progress lemma. 


4.2 SMT-Based Inductiveness Checking 


We rely on recent advances of SMT solving to identify successful candidates, a 
conjunction of which is directly used to prove the desired safety specification. In 
general, solving quantified formulas for validity is a hard task, however, in certain 
cases, the initiation and inductiveness queries can be simplified and reduced to 
a sequence of (sometimes even quantifier-free) formulas over integer arithmetic. 
We illustrate such proving strategy, inspired by the tiling approach [9], on the 
following example. 


Example 3. Recall the CHC system from Fig. 2. Consider a progress candidate 
Vj.i <j <N = m < Alj] for inv. Checking its initiation (i.e., for CHC A) 
requires deciding validity of the following quantified formula: 


l= N'-1Am =0 => (Vid <i <N = m < AgI) (3) 


The range formula i’ < j < N’ simplifies to N’ — 1 < j < N’, which is always 
false, making formula (3) always valid. 


? Thus, we explicitly require guards of loops to have the forms of an inequality, which 
is the most common array access pattern. 
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Checking the inductiveness of the candidate (i.e., for CHC B) boils down to 
solving a more complicated formula: 


(vii< <n = m < Aljl) 
Ai>O0Am = ite(m > Ali], Ali, m) Av =i-1 = 


(vj. <j<N = m’ < Alj]) (4) 


Although quantifiers are present on both sides of (4), proving its validity is not 
hard. Indeed, the query is reducible to two implications: 


(vj.i<j<N = m < Aljl) Am! = ite(m > Ali), Ali, m) => m < Aji] 


(Yj.i<j<N = m < Aļj])^ 
m = ite(m > Ali], Ali], m) => (vj.i<i<N = m! < Alj]) 


The former does not require any information about Afi +1], ..., A[N — 1], so 
the entire quantified conjunction is ignored, and Afi] could be replaced by a fresh 
integer variable. The latter is trickier: it requires to prove that if all elements 
in a range are greater or equal than m, then they are also greater or equal to 
ite(m > Ali], Ali], m). This again is reduced to a quantifier-free formula over 
integer arithmetic: 


m < Alj] Am = ite(m > Afi], Ali], m) => m’ < Aly] 


Thus, because formulas (3) and (4) are valid, the progress candidate is proved a 
progress lemma. 


In general, we cannot always conduct proofs that easily. Often, the prereq- 
uisite for success is the commonality of an access function f in the candidate 
and the body of the CHC. Fortunately, our algorithm ensures that all access 
functions used in the candidates are borrowed directly from bodies of CHCs. 
Thus, in many cases, FREQHORN is able to check large amounts of candidates 
quickly. 


4.3 Strategy of Lemma Propagation 


In this subsection, we identify a useful strategy for propagation of quantified 
lemmas through adjacent CHCs in the given system, inspired by [17]. Let some 
inv, E R have the following lemma: 


Vg. pl) => & 


where ¢ = QVars(inv,), formula p over U IntVars(inv,) is either a range or 
a progress range, and £ is over gU Vars(invı). Let then a CHC C be such that 
rel(src(C’)) = inv, and rel(dst(C)) = inve, and its body be (%1, £2). 
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Definition 5. Forward propagation of lemma Yg. plq) => £ through C gives 
a formula of the following form: 
VG. (AF1 . pP) (T1) A P(%1, Z2)) => (AF1(Z1,q) -LA v(@1, 2) 


Example 4. Recall the example from Fig. 2 and the following lemma for inv: 


Vj.i<j <N = m< Ajj] 


The body of C isi < 0Ai’ = 0, thus the forward propagation gives the following 
formula: 


V7. Gi.t<jg<NAC<O0AT=0) = Giem< Aly] At <0AT = 0) 


Applying quantifier elimination to both sides of the implication, we get the 
following formula: 


Vi.0<j<N = m< Ajj] 


Note that this formula is not going to be immediately learned as a lemma, 
but instead should be checked by the solver for inductiveness. Intuitively, such a 
candidate represents some facts about array elements that were accessed during a 
loop that has terminated. If after the propagation it appeared that the candidate 
uses the entire range then we refer to such candidate to as a finalized candidate. 


4.4 Weakening Strategy 


Whenever a finalized candidate cannot be proven inductive, we often do not want 
to withdraw it completely. Instead, our algorithm runs weakening and proposes 
regress candidates. The main idea is to calculate a range of elements which have 
not been touched by the loop yet. This is an inverse of the procedure outlined 
in Sect. 4.1. 


Definition 6. Given inv € R, its Range(inv) and progressRange(inv) formu- 
las, we call a regress range a formula of the following kind: 


regressRange(inv) = Range(inv) A sprogressRange(inv) 


We call candidates that use regress ranges in their left sides as regress can- 
didates. Clearly, a regress candidate is weaker than the corresponding finalized 
candidate. Thus, from the failure to prove inductiveness of the finalized candi- 
date it does not follow that the regress candidate is not inductive; and it makes 
sense to try proving it in the next iteration. 


4.5 Learning from Sub-ranges 


In complicated scenarios of loops with multiple iterators, multiple array variables 
or multiple access functions, the iterative process of lemma discovery, might end 
up in a large number of quantified formulas and get lost while checking a can- 
didate for inductiveness (recall Sect. 4.2). To overcome current limitations in 
existing SMT solvers, it appeared to be useful to help the solver while generaliz- 
ing learned lemmas. In particular, a property could be learned for two subranges 
of an array, and then combined in the following way: 
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int N = nondetInt(); 
int *A = nondetArray (2*N) ; 
int vall = 1, val2 = 3, m = nondetInt(); 
for (int i = 1; i < N; i++) { 
if (m < val2) A[2*i-2] = val2; else A[2*i-2] = 0; 
if (m < vali) A[2*i-1] = valli; else A[2*i-1] = 0; } 
for (int i = 0; i < 2*N; i++) assert(A[i]J==0 || Ali] < m); 


Fig. 3. Learning from sub-ranges. 


Lemma 1. Let for some inv € R two lemmas be of the following kind: 
vg. pı) => £ vg. p2(q) = £ (5) 


Then, the following is also a lemma for inv: 


Vg. p1(Q) V p2o(Q) => 4 


Example 5. Figure3 shows a program from the tiling benchmark suite [9]. If 
lemmas Yj.0 < j < N = > A[2*j-1]) = 0v A[2* 7-1] <m and Vj.0 < 
j < N = Aļ2xj-— 2] = 0v A[2 x» j— 2] < m are discovered, then formula 
YVj.O0<j<2xN-—1 = Alj] = 0v A[j] < m is also a lemma. 


5 Evaluation 


We have implemented our algorithm on top of the FREQHORN? tool. It takes 
a system of CHCs with arrays as input and performs an enumerative search as 
presented in Sect. 4. The tool uses Z3 [12] to solve SMT queries. 

We have evaluated FREQHORN on 137 satisfiable CHC-translations of pub- 
licly available C programs (whose assertions are safe) taken from the SVCOMP 
ReachSafety Array subcategory and literature. These programs include variations 
of standard array copying, initializing, maximum, minimum, sorting, and tiling 
benchmarks. Among these 137 benchmarks, 79 have a single loop, and 58 have 
multiple loops, including 7 that have nested loops. These programs are encoded 
using the theories of Arrays, Linear (LIA) and Non-linear Integer Arithmetic 
(NIA). Our experiments have been performed on an Ubuntu 18.04 machine run- 
ning at 2.5GHz and having 16GB memory, with a timeout of 100s for every 
benchmark. FREQHORN solved 129 benchmarks within the timeout, of which 73 
solved benchmarks had a single loop and 56 had multiple loops. 

We have compared our tool with SPACER (Z3 v4.8.3) [26], that implements a 
recent QUIC3 [22] algorithm, BOOSTER (v0.2) [2], VIAP (v1.0) [35], and VERI- 
ABS (v1.3.10) [11]. The last two tools performed well in the ReachSafety Array 


3 The source code and benchmarks are available at https: //github.com/grigoryfedyuk 
ovich/aeval/tree/rnd. 
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vs VIAP vs BOOSTER 

Fig. 4. FREQHORN vs competitors. Each point in a plot represents a pair of the run 
times (sec x sec) of FREQHORN (x-axis) and a competitor (y-axis). Timeouts are placed 
on the inner dashed lines; false alarms, unsupported cases, and crashes are on the outer 
dashed lines. 


subcategory at SVCOMP 2019*. Figure 4 gives a comparison of FREQHORN 
timings against timings of these tools.° 

Compare to 129 benchmarks solved by FREQHORN, only 81 were solved by 
SPACER, 108 — by VERIABS, 70 — by VIAP, and 48 — by BOOSTER. 

FREQHORN solved 54 benchmarks on which SPACER diverged. Our intuition 
is that SPACER works poorly on programs with non-deterministic assignments 
and NIA operations, which our tool can handle. 

FREQHORN solved 27 benchmarks on which VERIABS diverged. VERIABS 
failed to solve programs with nested loops and when array values were depen- 
dent on access indices. Furthermore, it decided one of the programs as unsafe, 
Time-wise, FREQHORN significantly outperformed VERIABS on all benchmarks. 


* https: //sv-comp.sosy-lab.org/2019/results/results-verified /. 
5 The time taken for every benchmark is available at: http://bit.ly/2VS5Mtt. 
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Importantly, the short time taken by FREQHORN includes the time for generat- 
ing a checkable witness — quantified invariant — an essence that VERIABS cannot 
produce by design. On the other side, VERIABS solved several benchmarks after 
merging loops. No quantified invariant satisfying the FREQHORN’s restrictions 
exists for these benchmarks before this program transformation. 

FREQHORN solved 60 programs on which VIAP diverged. VIAP decided 
one program as unsafe. There were no programs on which FREQHORN took more 
time than VIAP. Finally, FREQHORN solved 83 programs on which BOOSTER 
diverged. And again, BOOSTER decided two programs as unsafe. 


6 Related Work 


Our algorithm for quantified invariant synthesis extends the prior work on check- 
ing satisfiability of CHCs [15-17], where solutions do not permit quantifiers. It 
works in a similar — enumerate-and-check — manner, but there are two crucial 
changes: (1) introduction of quantifiers, to formulate hypotheses over a subset 
of array indices, and (2) a generalization mechanism, to derive properties that 
may hold over the entire range of array indices. 

Many existing approaches for verifying programs over arrays are extensions of 
well-known techniques for programs over scalar variables to quantified invariants. 
For example, by extending predicates with Skolem variables in predicate abstrac- 
tion [30], by exploiting the MCMT [19] framework in lazy abstraction with inter- 
polants [1] and its integration with acceleration [2], and, recently, QUIC3 [22], 
that extends IC3 [8, 14] to universally quantified invariants. Apart from the skele- 
tal similarity, however, these approaches rely on orthogonal techniques. 

Partitioning of arrays has also been used to infer invariants in many different 
ways. It refers to splitting an array into symbolic segments, and may be based 
on syntax [20,23,25] or semantics [10,31]. Invariants may be inferred for each 
segment separately and generalized for the entire array. The partitioning need 
not be explicit, as in [13]. However, most of these techniques (except [13,31]) are 
restricted to contiguous array segments, and work well when different loop itera- 
tions write to disjoint array locations or when the segments are non-overlapping. 
Tiling [9], a property-driven verification technique, overcomes these limitations 
for a class of programs by inferring array access patterns in loops. But identifying 
tiles of array accesses is itself a difficult problem, and the approach is currently 
based on heuristics developed by observing interesting patterns. 

There are a number of approaches that verify array programs without infer- 
ring quantified invariants explicitly. A straightforward way is to smash all array 
elements into a single memory location [4], but it is quite imprecise. Every array 
element might also be considered a separate variable, but it is not possible 
with unknown array sizes. There are also techniques that abstract an array 
to a fixed number of elements, e.g. k-distinguished cell abstraction [32,33] and 
k-shrinkability [24,29]. Such abstractions usually reduce array modifying loops 
with unknown bounds to a known, small bound. It may even be possible to get 
rid of such loops altogether, by accelerating (computing transitive closures of) 
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transition relations involving array updates in that loop [7]. Along similar lines, 
VIAP [35] resorts to reasoning with recurrences instead of loops. It translates 
the input program, including loops, to a set of first-order axioms, and checks if 
they derive the property. But all these techniques do not obtain quantified invari- 
ants explicitly, unlike ours. Besides, many of these transformations produce an 
abstraction of the original program, i.e., they do not preserve safety. 

Alternatively, there are approaches that use sufficiently expressive templates 
to infer quantified invariants over arrays [5,21,27]. However, the templates need 
to be supplied manually. For instance, [6] uses a template space of quantified 
invariants and reduces the problem to quantifier-free invariant generation. Thus, 
universally quantified solutions for unknown predicates in a CHC system may 
be obtained by extending a generic CHC solver to handle quantified predicates. 
Learning need not be limited to user-supplied templates; one may do away with 
the templates entirely and learn only from examples and counterexamples [18]. 
Alternatively, [36] chooses a template upfront and refurbishes it with constants 
or coefficients appearing in the program source. Similarly, [28] proposes to infer 
array invariants without any user guidance or any user-defined templates or pred- 
icates. Their method is based on automatic analysis of predicates that update 
an array and allows one to generate first-order invariants, including those that 
contain alternations of quantifiers. But it does not work for nested loops. By 
comparison, our technique supports multiple as well as nested loops, enables 
candidate propagation between loops and, more importantly, generates the gram- 
mar automatically from the syntactical constructions appearing in the program’s 
source. 


7 Conclusion 


We have presented a new algorithm to synthesize quantified invariants over array 
variables, systematically accessed in loops. Our algorithm implements an enu- 
merative search that guesses invariants based on syntactic constructions which 
appear in the code and checks their initiation, inductiveness, and safety with 
an off-the-shelf SMT solver. Key insights behind our approach are that indi- 
vidual accesses to array elements performed in the loop can be generalized to 
hypotheses about entire ranges, and the existing SMT solvers can be used to val- 
idate these hypotheses efficiently. Our implementation on top of a CHC solver 
FREQHORN confirmed that such strategy is effective on a variety of practical 
examples. In a vast majority of cases, our tool outperformed competitors and 
provided checkable guarantees that prevented from reporting false positives. 


Acknowledgements. This work was supported in part by NSF Grant 1525936. Any 
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Abstract. We consider the problem of synthesizing a program given a 
probabilistic specification of its desired behavior. Specifically, we study 
the recent paradigm of distribution-guided inductive synthesis (DIGITS), 
which iteratively calls a synthesizer on finite sample sets from a given 
distribution. We make theoretical and algorithmic contributions: (i) We 
prove the surprising result that DIGITS only requires a polynomial num- 
ber of synthesizer calls in the size of the sample set, despite its ostensi- 
bly exponential behavior. (ii) We present a property-directed version of 
DIGITS that further reduces the number of synthesizer calls, drastically 
improving synthesis performance on a range of benchmarks. 


1 Introduction 


Over the past few years, progress in automatic program synthesis has touched 
many application domains, including automating data wrangling and data 
extraction tasks [2, 13,15, 21,22,30], generating network configurations that meet 
user intents [10,29], optimizing low-level code [25,28], and more [4,14]. 

The majority of the current work has focused on synthesis under Boolean 
constraints. However, often times we require the program to adhere to a prob- 
abilistic specification, e.g., a controller that succeeds with a high probability, a 
decision-making model operating over a probabilistic population model, a ran- 
domized algorithm ensuring privacy, etc. In this work, we are interested in (1) 
investigating probabilistic synthesis from a theoretical perspective and (2) devel- 
oping efficient algorithmic techniques to tackle this problem. 

Our starting point is our recent framework for probabilistic synthesis called 
distribution-guided inductive synthesis (DIGITS) [1]. The DIGITS framework is 
analogous in nature to the guess-and-check loop popularized by counterexample- 
guided approaches to synthesis and verification (CEGIS and CEGAR). The key 
idea of the algorithm is reducing the probabilistic synthesis problem to a non- 
probabilistic one that can be solved using existing techniques, e.g., SAT solvers. 
This is performed using the following loop: (1) approximating the input proba- 
bility distribution with a finite sample set; (2) synthesizing a program for various 
possible output assignments of the finite sample set; and (3) invoking a proba- 
bilistic verifier to check if one of the synthesized programs indeed adheres to the 
given specification. 
© The Author(s) 2019 
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DIGITS has been shown to theoretically converge to correct programs when 
they exist—thanks to learning-theory guarantees. The primary bottleneck of 
DIGITS is the number of expensive calls to the synthesizer, which is ostensibly 
exponential in the size of the sample set. Motivated by this observation, this 
paper makes theoretical, algorithmic, and practical contributions: 


— On the theoretical side, we present a detailed analysis of DIGITS and prove 
that it only requires a polynomial number of invocations of the synthesizer, 
explaining that the strong empirical performance of the algorithm is not 
merely due to the heuristics presented in [1] (Sect. 3). 

— On the algorithmic side, we develop an improved version of DIGITS that is 
property-directed, in that it only invokes the synthesizer on instances that 
have a chance of resulting in a correct program, without sacrificing conver- 
gence. We call the new approach T-DIGITS (Sect. 4). 

— On the practical side, we implement 7-DIGITS for sketch-based synthesis and 
demonstrate its ability to converge significantly faster than DIGITS. We apply 
our technique to a range of benchmarks, including illustrative examples that 
elucidate our theoretical analysis, probabilistic repair problems of unfair pro- 
grams, and probabilistic synthesis of controllers (Sect. 5). 


2 An Overview of DIGITS 


In this section, we present the synthesis problem, the DIGITS [1] algorithm, and 
fundamental background on learning theory. 


2.1 Probabilistic Synthesis Problem 


Program Model. As discussed in [1], DIGITS searches through some (infinite) 
set of programs, but it requires that the set of programs has finite VC dimension 
(we restate this condition in Sect. 2.3). Here we describe one constructive way of 
obtaining such sets of programs with finite VC dimension: we will consider sets of 
programs defined as program sketches [27] in the simple grammar from [1], where 
a program is written in a loop-free language, and “holes” defining the sketch 
replace some constant terminals in expressions.! The syntax of the language is 
defined below: 


P= V E|if B then P else P| P P| return V 


Here, P is a program, V is the set of variables appearing in P, F (resp. B) is 
the set of linear arithmetic (resp. Boolean) expressions over V (where, again, 
constants in E and B can be replaced with holes), and V + EF is an assignment. 
We assume a vector vr of variables in V that are inputs to the program. We 


1 Tn the case of loop-free program sketches as considered in our program model, we can 
convert the input-output relation into a real arithmetic formula that guaranteedly 
has finite VC dimension [12]. 
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also assume there is a single Boolean variable vy € V that is returned by the 
program.” All variables are real-valued or Boolean. Given a vector of constant 
values c, where |c| = |vz|, we use P(c) to denote the result of executing P on 
the input c. 

In our setting, the inputs to a program are distributed according to some 
joint probability distribution D over the variables vz. Semantically, a program P 
is denoted by a distribution transformer [P], whose input is a distribution over 
values of vy and whose output is a distribution over vyr and vp. 

A program also has a probabilistic postcondition, post, defined as an inequality 
over terms of the form Pr[B], where B is a Boolean expression over vz and vp. 
Specifically, a probabilistic postcondition consists of Boolean combinations of 
the form e > c, where c € R and e is an arithmetic expression over terms of the 
form Pr[B], e.g., Pr[Bi]/ Pr|B2] > 0.75. 

Given a triple (P, D, post), we say that P is correct with respect to D and 
post, denoted [P](D) H post, iff post is true on the distribution [P](D). 


Example 1. Consider the set of intervals of the form [0,a] C [0,1] and inputs x 
uniformly distributed over [0,1] (i.e. D = Uniform/0, 1]). We can write inclusion 
in the interval as a (C-style) program (left) and consider a postcondition stating 
that the interval must include at least half the input probability mass (right): 
if(0 <= x && x <= a) { 
return 1; 


l Prz.p|P(x) = 1] > 0.5 


return 0; 


Let P, denote the interval program where a is replaced by a constant c € [0, 1]. 
Observe that [P.](ID) describes a joint distribution over (2,v,) pairs, where 
(0, c] x {1} is assigned probability measure c and (c, 1] x {0} is assigned probability 
measure 1 — c. Therefore, [P.](D) — post if and only if c € [0.5, 1]. 


Synthesis Problem. DIGITS outputs a program that is approximately “sim- 
ilar” to a given functional specification and that meets a postcondition. This 
functional specification is some input-output relation which we quantitatively 
want to match as closely as possible: specifically, we want to minimize the 
error of the output program P from the functional specification P, defined as 
Er(P) := Prz~p[P(x) 4 P(x)]. (Note that we represent the functional specifica- 
tion as a program.) The postcondition is Boolean, and therefore we always want 
it to be true. DIGITS is guaranteed to converge whenever the space of solutions 
satisfying the postcondition is robust under small perturbations. The following 
definition captures this notion of robustness: 


Definition 1 (a-Robust Programs). Fix an input distribution D, a postcon- 
dition post, and a set of programs P. For any P E€ P and any a > 0, denote the 


? Restricting the output to Boolean is required by the algorithm; other output types 
can be turned into Boolean by rewriting. See, e.g., thermostat example in Sect. 5. 
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open a-ball centered at P as By(P) = {P' € P | Pricn[P(x) 4 P’(x)] < a}. 
We say a program P is a-robust if VP’ € Ba(P). |P'](D) H post. 

We can now state the synthesis problem solved by DIGITS: 
Definition 2 (Synthesis Problem). Given an input distribution D, a set of 
programs P, a postcondition post, a functional specification P € P, and parame- 


ters a > 0 and0 < € < q, the synthesis problem is to find a program P € P such 
that [P](D) H post and such that any other a-robust P' has Er(P) < Er(P’)+e. 


2.2 A Naive DIGITS Algorithm 


Algorithm 1 shows a simplified, naive version of DIGITS, which employs a 
synthesize-then-verify approach. The idea of DIGITS is to utilize non-probabilistic 
synthesis techniques to synthesize a set of programs, and then apply a proba- 
bilistic verification step to check if any of the synthesized programs is a solution. 
Specifically, this “Naive DIGITS” 
begins by sampling an appropriate 
number of inputs from the input 
distribution and stores them in the EA . 3> {0,1} do 


1 Procedure DIGITS (Ê, D, post, m) 

2 

3 
ee : 4 

set S. Second, it iteratively explores, P — Ogyn({(a, f(z)) | 2 € SY) 

6 

7 

8 


S<—{x«~D|ieé[1,...,m]} 


each possible function f that maps if P + l then 

the input samples to a Boolean and | progs — progs U {P} 
invokes a synthesis oracle to synthe- res — {P € progs | 

size a program P that implements Over(P, D, post)} 

f, i.e. that satisfies the set of input- ə return argmin peres {0err(P)} 
output examples in which each input 

x € S is mapped to the output f(z). Algorithm 1: Naive DIGITS 

Naive DIGITS then finds which of the 

synthesized programs satisfy the postcondition (the set res); we assume that 
we have access to a probabilistic verifier yer to perform these computations. 
Finally, the algorithm outputs the program in the set res that has the lowest 
error with respect to the functional specification, once again assuming access to 
another oracle Og, that can measure the error. 

Note that the number of such functions f : S — {0,1} is exponential in the 
size of |S]. As a “heuristic” to improve performance, the actual DIGITS algorithm 
as presented in [1] employs an incremental trie-based search, which we describe 
(alongside our new algorithm, 7-DIGITS) and analyze in Sect. 3. The naive version 
described here is, however, sufficient to discuss the convergence properties of the 
full algorithm. 


2.3 Convergence Guarantees 


DIGITS is only guaranteed to converge when the program model P has finite VC 
dimension.” Intuitively, the VC dimension captures the expressiveness of the set 


3 Recall that this is largely a “free” assumption since, again, sketches in our loop-free 
grammar guaranteedly have finite VC dimension. 
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of ({0,1}-valued) programs P. Given a set of inputs S, we say that P shatters 
S iff, for every partition of S into sets So U S1, there exists a program P € P 
such that (i) for every x € So, P(x) = 0, and (ii) for every x € S1, P(x) =1. 


Definition 3 (VC Dimension). The VC dimension of a set of programs P is 
the largest integer d such that there exists a set of inputs S with cardinality d 
that is shattered by P. 


We define the function VCcost(e, ô, d) = +(4log,()+8dlog,(+2)) [5], which 
is used in the following theorem: 


Theorem 1 (Convergence). Assume that there exist an a > 0 and program 
P* that is a-robust w.r.t. D and post. Let d be the VC dimension of the set of 
programs P. For all bounds 0 < £ < a and 6 > 0, for every function Osyn, 
and for any m > VCcost(e, ô, k), with probability > 1 — ô we have that DIGITS 
enumerates a program P with Pre~p|P* (x) 4 P(x)] < £ and [P](D) E post. 


To reiterate, suppose P* is a correct program with small error Er(P*) = k; 
the convergence result follows two main points: (i) P* must be a-robust, meaning 
every P with Pry. p[P(x) 4 P*(x)| < a must also be correct, and therefore 
(ii) by synthesizing any P such that Pre~p| P(x) 4 P*(x)] < © where e€ < a, 
then P is a correct program with error Er(P) within k +e. 


2.4 Understanding Convergence 


The importance of finite VC dimension is due to the fact that the convergence 
statement borrows directly from probably approximately correct (PAC) learning. 
We will briefly discuss a core detail of efficient PAC learning that is relevant to 
understanding the convergence of DIGITS (and, in turn, our analysis of T-DIGITS 
in Sect. 4), and refer the interested reader to Kearns and Vazirani’s book [16] 
for a complete overview. Specifically, we consider the notion of an ¢-net, which 
establishes the approximate-definability of a target program in terms of points 
in its input space. 


Definition 4 (e-net). Suppose P € P is a target program, and points in its 
input domain X are distributed x ~ D. For a fixed € € [0,1], we say a set of 
points S C X is ane-net for P (with respect to P and D) if for every P! € P with 
Przvp[P(x) 4 P’(x)| >e there exists a witness x € S such that P(x) 4 P'(x). 


In other words, if S is an e-net for P, and if P’ “agrees” with P on all of S, then 
P and P’ can only differ by at most £ probability mass. 

Observe the relevance of e-nets to the convergence of DIGITS: the synthesis 
oracle is guaranteed not to “fail” by producing only programs e-far from some 
e-robust P* if the sample set happens to be an e-net for P*. In fact, this obser- 
vation is exactly the core of the PAC learning argument: having an ¢-net exactly 
guarantees the approximate learnability. 

A remarkable result of computational learning theory is that whenever P has 
finite VC dimension, the probability that m random samples fail to yield an e-net 
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becomes diminishingly small as m increases. Indeed, the given VCcosT function 
used in Theorem 1 is a dual form of this latter result—that polynomially many 
samples are sufficient to form an ¢-net with high probability. 


3 The Efficiency of Trie-Based Search 


After providing details on the search strategy employed by DIGITS, we present our 
theoretical result on the polynomial bound on the number of synthesis queries 
that DIGITS requires. 


3.1 The Trie-Based Search Strategy of DIGITS 


Naive DIGITS, as presented in Algorithm 1, performs a very unstructured, expo- 
nential search over the output labelings of the sampled inputs—i.e., the possi- 
ble Boolean functions f in Algorithm 1. In our original paper [1] we present a 
“heuristic” implementation strategy that incrementally explores the set of pos- 
sible output labelings using a trie data structure. In this section, we study the 
complexity of this technique through the lens of computational learning theory 
and discover the surprising result that DIGITS requires a polynomial number 
of calls to the synthesizer in the size of the sample set! Our improved search 
algorithm (Sect. 4) inherits these results. 

For the remainder of this paper, we use DIGITS to refer to this incremental 
version. A full description is necessary for our analysis: Fig. 1 (non-framed rules 
only) consists of a collection of guarded rules describing the construction of the 
trie used by DIGITS to incrementally explore the set of possible output label- 
ings. Our improved version, T-DIGITS (presented in Sect. 4), corresponds to the 
addition of the framed parts, but without them, the rules describe DIGITS. 

Nodes in the trie represent partial output labelings—i.e., functions f assign- 
ing Boolean values to only some of the samples in S = {x,...,2%m}. Each node 
is identified by a binary string o = bı ---bp (k can be smaller than m) denot- 
ing the path to the node from the root. The string o also describes the partial 
output-labeling function f corresponding to the node—i.e., if the i-th bit b; is set 
to 1, then f(z;) = true. The set explored represents the nodes in the trie built 
thus far; for each new node, the algorithm synthesizes a program consistent with 
the corresponding partial output function (“Explore” rules). The variable depth 
controls the incremental aspect of the search and represents the maximum length 
of any o in explored; it is incremented whenever all nodes up to that depth have 
been explored (the “Deepen” rule). The crucial part of the algorithm is that, if 
no program can be synthesized for the partial output function of a node identi- 
fied by ø, the algorithm does not need to issue further synthesis queries for the 
descendants of ø. 

Figure 2 shows how DIGITS builds a trie for an example run on the interval 
programs from Example 1, where we suppose we begin with an incorrect program 
describing the interval [0, 0.3]. Initially, we set the root program to [0,0.3] (left 
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Initialize 


explored + {e} P.+ Ê depth+0 best HL 


Yo € explored. Yb € {0,1}. 
(Ps # LA |ob| < depth unblocked (ob) |) > ob € explored 


Deepen 
sample gentn41 ~D depth + depth +1 


o € explored P, #1 be {0,1} 
ob Z explored |ob| < depth | unblocked(ab) 


Explore (Synthesis Query) 
Pov + Osyn({(sample,,,,0b(4)) :0 <i < |ob|}) 
explored < explored U {ab} 


o € explored Pz Al bE {0,1} ob ¢ explored 
|ob| < depth | unblocked(ob)| P,(sample),y)) = b 


Explore (Solution Propagation) 
Ps < Ps explored < explored U {ab} 


o* = argmin,{Oerr (Po) | o € explored ^A Pz # L A Over(P,) = true} 


Best 
best + P,« 


where unblocked (o) = |{t:0 <i < |øo| A a(i) F P(sample,,1)}| <7 - depth 


Fig. 1. Full DIGITS description and our new extension, T-DIGITS, shown in boxes. 


figure). The “Deepen” rule applies, so a sample is added to the set of samples— 
suppose it’s 0.4. “Explore” rules are then applied twice to build the children of 
the root: the child following the 0 branch needs to map 0.4 + 0, which [0, 0.3] 
already does, thus it is propagated to that child without asking Osyn to perform 
a synthesis query. For the child following 1, we instead make a synthesis query, 
using the oracle Os,n, for any value of a such that [0, a] maps 0.4 —> 1—suppose 
it returns the solution a = 1, and we associate [0, 1] with this node. At this point 
we have exhausted depth 1 (middle figure), so “Deepen” once again applies, 
perhaps adding 0.6 to the sample set. At this depth (right figure), only two calls 
to Osyn are made: in the case of the call at o = 01, there is no value of a that 
causes both 0.4 +> 0 and 0.6 +> 1, so Osyn returns L, and we do not try to explore 
any children of this node in the future. The algorithm continues in this manner 
until a stopping condition is reached—e.g., enough samples are enumerated. 


3.2 Polynomial Bound on the Number of Synthesis Queries 


We observed in [1] that the trie-based exploration seems to be efficient in prac- 
tice, despite potential exponential growth of the number of explored nodes in 
the trie as the depth of the search increases. The convergence analysis of DIGITS 
relies on the finite VC dimension of the program model, but VC dimension itself 
is just a summary of the growth function, a function that describes a notion 
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S = {0.4, 0.6} 
< SN A ~ 
(0,0.3] (0.1) 2 \ / \ 


(0, 0.3] (0, 0.5] [0-1] 


Fig. 2. Example execution of incremental DIGITS on interval programs, starting from 
[0, 0.3]. Hollow circles denote calls to Osyn that yield new programs; the cross denotes 
a call to Osyn that returns L. 


of complexity of the set of programs in question. We will see that the growth 
function much more precisely describes the behavior of the trie-based search; we 
will then use a classic result from computational learning theory to derive better 
bounds on the performance of the search. We define the growth function below, 
adapting the presentation from [16]. 


Definition 5 (Realizable Dichotomies). We are given a set P of programs 
representing functions from X — {0,1} and a (finite) set of inputs S C X. We 
call any f : S — {0,1} a dichotomy of S; if there exists a program P € P that 
extends f to its full domain X, we call f a realizable dichotomy in P. We denote 
the set of realizable dichotomies as 


IIp(S) := {f : S — {0,1} | IP € P.Yx € S. P(x) = f(ax)}. 


Observe that for any (infinite) set P and any finite set S that 1 < |Hp(S)| < 2191. 
We define the growth function in terms of the realizable dichotomies: 


Definition 6 (Growth Function). The growth function is the maximal num- 
ber of realizable dichotomies as a function of the number of samples, denoted 


fp(m) = max {|Fp(S))}. 
|S|=m 


Observe that P has VC dimension d if and only if d is the largest integer satisfying 
Ilp(d) = 2f (and infinite VC dimension when IIp(m) is identically 2™)— in fact, 
VC dimension is often defined using this characterization. 


Example 2. Consider the set of intervals of the form [0,a] as in Examples 1 and 
Fig. 2. For the set of two points S = {0.4, 0.6}, we have that |Ijo,a}(S)| = 3, since, 
by example: a = 0.5 accepts 0.4 but not 0.6, a = 0.3 accepts neither, and a = 1 
accepts both, thus these three dichotomies are realizable; however, no interval 
with 0 as a left endpoint can accept 0.6 and not 0.4, thus this dichotomy is not 
realizable. In fact, for any (finite) set S C [0,1], we have that |ITjo,4)($)| = |$|+1; 
we then have that Tljo,a)(™m) =m+l1. 
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When DIGITS terminates having used a sample set S, it has considered all 
the dichotomies of S: the programs it has enumerated exactly correspond to 
extensions of the realizable dichotomies IIp(S). The trie-based exploration is 
effectively trying to minimize the number of Osyn queries performed on non- 
realizable ones, but doing so without explicit knowledge of the full functional 
behavior of programs in P. In fact, it manages to stay relatively close to per- 
forming queries only on the realizable dichotomies: 


Lemma 1. DIGITS performs at most |S||IIp(S)| synthesis oracle queries. More 
precisely, let S = {x1,...,£m} be indexed by the depth at which each sample was 
added: the exact number of synthesis queries is \~p_,|Ip({11,...,2e-1})|- 


Proof. Let Tq denote the total number of queries performed once depth d is 
completed. We perform no queries for the root,* thus Ty = 0. Upon completing 
depth d — 1, the realizable dichotomies of {21,...,g_-1} exactly specify the 
nodes whose children will be explored at depth d. For each such node, one child 
is skipped due to solution propagation, while an oracle query is performed on the 
other, thus Tg = Ty-1 + |Up({x1,...,%a—1})|. Lastly, | Ip(S)| cannot decrease 
by adding elements to S, so we have that Tm = Xp- |Hp({a1,---,2e-1})| < 
YM [Ip(S)| < |SIIIp(5)]. 


Connecting DIGITS to the realizable dichotomies and, in turn, the growth 
function allows us to employ a remarkable result from computational learning 
theory, stating that the growth function for any set exhibits one of two asymp- 
totic behaviors: it is either identically 2™ (infinite VC dimension) or dominated 
by a polynomial! This is commonly called the Sauer-Shelah Lemma [24, 26]: 


Lemma 2 (Sauer-Shelah). If P has finite VC dimension d, then for all m > 
d, Iip(m) < (28)*; ie. Hp(m) = O(m’). 


Combining our lemma with this famous one yields a surprising result—that 
for a fixed set of programs P with finite VC dimension, the number of oracle 
queries performed by DIGITS is guaranteedly polynomial in the depth of the 
search, where the degree of the polynomial is determined by the VC dimension: 


Theorem 2. If P has VC dimension d, then DIGITS performs O(m¢*?) 
synthesis-oracle queries. 


In short, the reason an execution of DIGITS seems to enumerate a sub- 
exponential number of programs (as a function of the depth of the search) is 
because it literally must be polynomial. Furthermore, the algorithm performs 
oracle queries on nearly only those polynomially-many realizable dichotomies. 


Example 3. A DIGITS run on the [0,a] programs as in Fig.2 using a sample set 
of size m will perform O(m?) oracle queries, since the VC dimension of these 
intervals is 1. (In fact, every run of the algorithm on these programs will perform 
exactly 4m(m + 1) many queries.) 


4 We assume the functional specification itself is some Ê € P and thus can be used— 
the alternative is a trivial synthesis query on an empty set of constraints. 
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4 Property-Directed 7-DIGITS 


DIGITS has better convergence guarantees when it operates on larger sets of 
sampled inputs. In this section, we describe a new optimization of DIGITS that 
reduces the number of synthesis queries performed by the algorithm so that it 
more quickly reaches higher depths in the trie, and thus allows to scale to larger 
samples sets. This optimized DIGITS, called T-DIGITS, is shown in Fig. 1 as the set 
of all the rules of DIGITS plus the framed elements. The high-level idea is to skip 
synthesis queries that are (quantifiably) unlikely to result in optimal solutions. 
For example, if the functional specification P maps every sampled input in S to 
0, then the synthesis query on the mapping of every element of S to 1 becomes 
increasingly likely to result in programs that have maximal distance from Ê as 
the size of S increases; hence the algorithm could probably avoid performing 
that query.In the following, we make use of the concept of Hamming distance 
between pairs of programs: 


Definition 7 (Hamming Distance). For any finite set of inputs S and any 
two programs P;, P2, we denote Hamming,s(Pi, P2) = Hæ € S | Pi(x) # Po(x)}| 
(we will also allow any {0,1}-valued string to be an argument of Hamming g). 


4.1 Algorithm Description 


Fix the given functional specification Ê and suppose that there exists an e-robust 
solution P* with (nearly) minimal error k = Er(P*) := Przwp[P(x) 4 P*(a)|; 
we would be happy to find any program P in P*’s e-ball. Suppose we angelically 
know k a priori, and we thus restrict our search (for each depth m) only to 
constraint strings (i.e. ø in Fig. 1) that have Hamming distance not much larger 
than km. 

To be specific, we first fix some threshold 7 € (k, 1]. Intuitively, the optimiza- 
tion corresponds to modifying DIGITS to consider only paths o through the trie 
such that Hamming,(P,c) < T|S|. This is performed using the unblocked func- 
tion in Fig.1. Since we are ignoring certain paths through the trie, we need to 
ask: How much does this decrease the probability of the algorithm succeeding ?— 
It depends on the tightness of the threshold, which we address in Sect. 4.2. In 
Sect. 4.3, we discuss how to adaptively modify the threshold 7 as 7-DIGITS is 
executing, which is useful when a good 7 is unknown a priori. 


4.2 Analyzing Failure Probability with Thresholding 


Using T-DIGITS, the choice of 7 will affect both (i) how many synthesis queries are 
performed, and (ii) the likelihood that we miss optimal solutions; in this section 
we explore the latter point.” Interestingly, we will see that all of the analysis is 
dependent only on parameters directly related to the threshold; notably, none 
of this analysis is dependent on the complexity of P (i.e. its VC dimension). 


5 The former point is a difficult combinatorial question that to our knowledge has no 
precedent in the computational learning literature, and so we leave it as future work. 
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If we really want to learn (something close to) a program P*, then we should 
use a value of the threshold 7 such that Prgwpm{Hamming,(P, P*) < rm] is 
large—to do so requires knowledge of the distribution of Hamming,(P, P*). 
Recall the binomial distribution: for parameters (n, p), it describes the number 
of successes in n-many trials of an experiment that has success probability p. 


Claim. Fix P and let k = Prz.p[P(x) # P(x)]. If S is sampled from D”, then 
Hamming,(P, P) is binomially distributed with parameters (m, k). 


Next, we will use our knowledge of this distribution to reason about the failure 
probability, i.e. that T-DIGITS does not preserve the convergence result of DIGITS. 

The simplest argument we can make is a union-bound style argument: the 
thresholded algorithm can “fail” by (i) failing to sample an ¢-net, or otherwise 
(ii) sampling a set on which the optimal solution has a Hamming distance that 
is not representative of its actual distance. We provide the quantification of this 
failure probability in the following theorem: 


Theorem 3. Let P* be a target ¢-robust program with k = Pre~p|Ê(x) 4 P*(x)], 
and let ô be the probability that m samples do not form an e-net for P*. If we run 
the T-DIGITS with T € (k, 1], then the failure probability is at most 6+Pr[X > 7m] 
where X ~ Binomial(m, k). 


In other words, we can use tail probabilities of the binomial distribution to bound 
the probability that the threshold causes us to “miss” a desirable program we 
otherwise would have enumerated. Explicitly, we have the following corollary: 


Corollary 1. 7-DIGITS increases failure probability (relative to DIGITS) by at 
most Pr| X > Tm] = 2 ins (k - ky. 


Informally, when m is not too small, k is not too large, and T is reasonably forgiv- 
ing, these tail probabilities can be quite small. We can even analyze the asymp- 
totic behavior by using any existing upper bounds on the binomial distribution’s 
tail probabilities—importantly, the additional error diminishes exponentially as 
m increases, dependent on the size of 7 relative to k. 


Corollary 2. T-DIGITS increases failure probability by at most e7 2m(T—k)? 6 
Example 4. Suppose m = 100, k = 0.1, and 7 = 0.2. Then the extra failure 
probability term in Theorem 3 is less than 0.001. 


As stated at the beginning of this subsection, the balancing act is to choose 
T (i) small enough so that the algorithm is still fast for large m, yet (ii) large 
enough so that the algorithm is still likely to learn the desired programs. The fur- 
ther challenge is to relax our initial strong assumption that we know the optimal 
k a priori when determining 7, which we address in the following subsection. 


—m(r In $ +(1—7)ln IP). 


6 A more precise (though less convenient) bound is e 
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4.3 Adaptive Threshold 


Of course, we do not have the angelic knowledge that lets us pick an ideal 
threshold 7; the only absolutely sound choice we can make is the trivial 7 = 1. 
Fortunately, we can begin with this choice of 7 and adaptively refine it as the 
search progresses. Specifically, every time we encounter a correct program P such 
that k = Er(P), we can refine 7 to reflect our newfound knowledge that “the 
best solution has distance of at most k.” 

We refer to this refinement as adaptive T-DIGITS. The modification involves 
the addition of the following rule to Fig. 1: 


best A L 
T — g(Oerr(best)) 


Refine Threshold (for some g : [0,1] — [0,1]) 


We can use any (non-decreasing) function g to update the threshold T — 
g(k). The simplest choice would be the identity function (which we use in our 
experiments), although one could use a looser function so as not to over-prune 
the search. If we choose functions of the form g(k) = k + b, then Corollary 2 
allows us to make (slightly weak) claims of the following form: 


Claim. Suppose the adaptive algorithm completes a search of up to depth m 
yielding a best solution with error k (so we have the final threshold value T = 
k + b). Suppose also that P* is an optimal ¢-robust program at distance k — n. 
The optimization-added failure probability (as in Corollary 1) for a run of (non- 
adaptive) 7T-DIGITS completing depth m and using this 7 is at most e~2m(b+n)” 


5 Evaluation 


Implementation. In this section, we evaluate our new algorithm 7-DIGITS 
(Fig. 1) and its adaptive variant (Sect. 4.3) against DIGITS (i.e., T-DIGITS with 
7 = 1). Both algorithms are implemented in Python and use the SMT solver 
Z3 [8] to implement a sketch-based synthesizer Os,,. We employ statistical ver- 
ification for Over and Oey: we use Hoeffding’s inequality for estimating proba- 
bilities in post and Er. Probabilities are computed with 95% confidence, leaving 
our oracles potentially unsound. 


Research Questions. Our evaluation aims to answer the following questions: 


RQ1 Is adaptive 7-DIGITS more effective/precise than T-DIGITS? 
RQ2 Is 7-DIGITS more effective/precise than DIGITS? 
RQ3 Can 7-DIGITS solve challenging synthesis problems? 


We experiment on three sets of benchmarks: (i) synthetic examples for which 
the optimal solutions can be computed analytically (Sect.5.1), (i) the set of 
benchmarks considered in the original DIGITS paper (Sect. 5.2), (iii) a variant of 
the thermostat-controller synthesis problem presented in [7] (Sect. 5.3). 
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5.1 Synthetic Benchmarks 


We consider a class of synthetic programs for which we can compute the opti- 
mal solution exactly; this lets us compare the results of our implementation 
to an ideal baseline. Here, the program model P is defined as the set of axis- 
aligned hyperrectangles within [—1,1]¢ (d € {1,2,3} and the VC dimension is 
2d), and the input distribution D is such that inputs are distributed uniformly 
over [—1,1]¢. We fix some probability mass b € {0.05,0.1,0.2} and define the 
benchmarks so that the best error for a correct solution is exactly b (for details, 
see [9]). 

We run our implementation using thresholds r € {0.07,0.15, 0.3, 0.5, 1}, 
omitting those values for which 7 < b; additionally, we also consider an adaptive 
run where 7 is initialized as the value 1, and whenever a new best solution is 
enumerated with error k, we update 7 — k. Each combination of parameters 
was run for a period of 2 min. Figure3 fixates on d = 1, b = 0.1 and shows each 
of the following as a function of time: (i) the depth completed by the search 
(i.e. the current size of the sample set), and (ii) the best solution found by the 
search. (See our full version of the paper [9] for other configurations of (d, b).) 


de adaptive ——T=1 o—7T=0.5 4—T=0.3 7=0.15 


= 
© 
© 


ol 
[=] 
Best Error 


Depth Completed 


0 50 100 10° 101 10? 
Time (s) Time (s) (log scale) 


Fig. 3. Synthetic hyperrectangle problem instance with parameters d = 1, b = 0.1. 


By studying Fig.3 we see that the adaptive threshold search performs at 
least as well as the tight thresholds fixed a priori because reasonable solutions 
are found early. In fact, all search configurations find solutions very close to the 
optimal error (indicated by the horizontal dashed line). Regardless, they reach 
different depths, and the main advantage of reaching large depths concerns the 
strength of the optimality guarantee. Note, also, that small 7 values are neces- 
sary to see improvements in the completed depth of the search. Indeed, the dis- 
crepancy between the depth-versus-time functions diminishes drastically for the 
problem instances with larger values of b (See our full version of the paper [9]); 
the gains of the optimization are contingent on the existence of correct solutions 
close to the functional specification. 
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Findings (RQ1): 7-picITs does tend to find reasonable solutions at early 
depths and near-optimal solutions at later depths, thus adaptive 7-DIGITS is 
more effective than T-DIGITS, and we use it throughout our remaining experi- 
ments. 


5.2 Original DIGITS Benchmarks 


The original DIGITS paper [1] evaluates on a set of 18 repair problems of varying 
complexity. The functional specifications are machine-learned decision trees and 
support vector machines, and each search space P involves the set of programs 
formed by replacing some number of real-valued constants in the program with 
holes. The postcondition is a form of algorithmic fairness—e.g., the program 
should output true on inputs of type A as often as it does on inputs of type 
B [11]. For each such repair problem, we run both DIGITS and adaptive T-DIGITS 
(again, with initial 7 = 1 and the identity refinement function). Each benchmark 
is run for 10 min, where the same sample set is used for both algorithms. 


Depth Completed Best Error 
400 : T 7 0.3 7 


300 t? , 
: 0.2 voy 


200F + , 


0.1 


adaptive T-DIGITS 
XN 

adaptive T-DIGITS 
i 


DIGITS DIGITS 


Fig. 4. Improvement of using adaptive T-DIGITS on the original DIGITS benchmarks. 
Left: the dotted line marks the 2.4x average increase in depth. 


Figure 4 shows, for each benchmark, (i) the largest sample set size completed 
by adaptive T-DIGITS versus DIGITS (left—above the diagonal line indicates adap- 
tive T-DIGITS reaches further depths), and (ii) the error of the best solution 
found by adaptive T-DIGITS versus DIGITS (right—below the diagonal line indi- 
cates adaptive T-DIGITS finds better solutions). We see that adaptive 7T-DIGITS 
reaches further depths on every problem instance, many of which are substantial 
improvements, and that it finds better solutions on 10 of the 18 problems. For 
those which did not improve, either the search was already deep enough that 
DIGITS was able to find near-optimal solutions, or the complexity of the synthesis 
queries is such that the search is still constrained to small depths. 


Findings (RQ2): Adaptive 7-DIGITS can find better solutions than those found 
by DIGITS and can reach greater search depths. 
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5.3 Thermostat Controller 


We challenge adaptive 7-DIGITS with the task of synthesizing a thermostat con- 
troller, borrowing the benchmark from [7]. The input to the controller is the 
initial temperature of the environment; since the world is uncertain, there is a 
specified probability distribution over the temperatures. The controller itself is a 
program sketch consisting primarily of a single main loop: iterations of the loop 
correspond to timesteps, during which the synthesized parameters dictate an 
incremental update made by the thermostat based on the current temperature. 
The loop runs for 40 iterations, then terminates, returning the absolute value of 
the difference between its final actual temperature and the target temperature. 

The postcondition is a Boolean probabilistic correctness property intuitively 
corresponding to controller safety, e.g. with high probability, the temperature 
should never exceed certain thresholds. In [7], there is a quantitative objective 
in the form of minimizing the expected value E[|actual — target||—our setting 
does not admit optimizing with respect to expectations, so we must modify the 
problem. Instead, we fix some value N (N € {2,4,8}) and have the program 
return 0 when |actual — target| < N and 1 otherwise. Our quantitative objective 
is to minimize the error from the constant-zero functional specification P(x) := 0 
(i.e. the actual temperature always gets close enough to the target). The full 
specification of the controller is provided in the full version of our paper [9]. 

We consider variants of the program where the thermostat runs for fewer 
timesteps and try loop unrollings of size {5, 10, 20,40}. We run each benchmark 
for 10min: the final completed search depths and best error of solutions are 
shown in Fig. 5. For this particular experiment, we use the SMT solver CVC4 [3] 
because it performs better than Z3 on the occurring SMT instances. 
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Fig. 5. Thermostat controller results. 


As we would expect, for larger values of N it is “easier” for the thermostat to 
reach the target temperature threshold and thus the quality of the best solution 
increases in N. However, with small unrollings (i.e. 5) the synthesized controllers 
do not have enough iterations (time) to modify the temperature enough for the 
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probability mass of extremal temperatures to reach the target: as we increase 
the number of unrollings to 10, we see that better solutions can be found since 
the set of programs are capable of stronger behavior. 

On the other hand, the completed depth of the search plummets as the 
unrolling increases due to the complexity of the Osyn queries. Consequently, for 
20 and 40 unrollings, adaptive 7-DIGITS synthesizes worse solutions because it 
cannot reach the necessary depths to obtain better guarantees. 

One final point of note is that for N = 8 and 10 unrollings, it seems that there 
is a sharp spike in the completed depth. However, this is somewhat artificial: 
because N = 8 creates a very lenient quantitative objective, an early Osyn query 
happens to yield a program with an error less than 1073. Adaptive T-DIGITS 
then updates r —~ 107° and skips most synthesis queries. 

Findings (RQ3): Adaptive T-DIGITS can synthesize small variants of a com- 
plex thermostat controller, but cannot solve variants with many loop iterations. 


6 Related Work 


Synthesis and Probability. Program synthesis is a mature area with many 
powerful techniques. The primary focus is on synthesis under Boolean con- 
straints, and probabilistic specifications have received less attention [1,7,17,19]. 
We discuss the works that are most related to ours. 

DIGITS [1] is the most relevant work. First, we show for the first time that 
DIGITS only requires a number of synthesis queries polynomial in the number of 
samples. Second, our adaptive T-DIGITS further reduces the number of synthesis 
queries required to solve a synthesis problem without sacrificing correctness. 

The technique of smoothed proof search [T] approximates a combination of 
functional correctness and maximization of an expected value as a smooth, con- 
tinuous function. It then uses numerical methods to find a local optimum of 
this function, which translates to a synthesized program that is likely to be cor- 
rect and locally maximal. The benchmarks described in Sect. 5.3 are variants 
of benchmarks from [7]. Smoothed proof search can minimize expectation; T- 
DIGITS minimizes probability only. However, unlike 7-DIGITS, smoothed proof 
search lacks formal convergence guarantees and cannot support the rich proba- 
bilistic postconditions we support, e.g., as in the fairness benchmarks. 

Works on synthesis of probabilistic programs are aimed at a different prob- 
lem [6,19,23]: that of synthesizing a generative model of data. For example, 
Nori et al. [19] use sketches of probabilistic programs and complete them with 
a stochastic search. Recently, Saad et al. [23] synthesize an ensemble of proba- 
bilistic programs for learning Gaussian processes and other models. 

Kucera et al. [17] present a technique for automatically synthesizing program 
transformations that introduce uncertainty into a given program with the goal of 
satisfying given privacy policies—e.g., preventing information leaks. They lever- 
age the specific structure of their problem to reduce it to an SMT constraint 
solving problem. The problem tackled in [17] is orthogonal to the one targeted 
in this paper and the techniques are therefore very different. 
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Stochastic Satisfiability. Our problem is closely related to E-MAJSAT [18], a 
special case of stochastic satisfiability (SSAT) [20] and a means for formalizing 
probabilistic planning problems. E-MAJSAT is of NP” complexity. An E-MAJSAT 
formula has deterministic and probabilistic variables. The goal is to find an 
assignment of deterministic variables such that the probability that the formula 
is satisfied is above a given threshold. Our setting is similar, but we operate over 
complex program statements and have an additional optimization objective (i.e., 
the program should be close to the functional specification). The deterministic 
variables in our setting are the holes defining the search space; the probabilistic 
variables are program inputs. 
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Abstract. We present two algorithmic approaches for synthesiz- 
ing linear hybrid automata from experimental data. Unlike previous 
approaches, our algorithms work without a template and generate an 
automaton with nondeterministic guards and invariants, and with an 
arbitrary number and topology of modes. They thus construct a suc- 
cinct model from the data and provide formal guarantees. In particular, 
(1) the generated automaton can reproduce the data up to a specified 
tolerance and (2) the automaton is tight, given the first guarantee. Our 
first approach encodes the synthesis problem as a logical formula in the 
theory of linear arithmetic, which can then be solved by an SMT solver. 
This approach minimizes the number of modes in the resulting model but 
is only feasible for limited data sets. To address scalability, we propose 
a second approach that does not enforce to find a minimal model. The 
algorithm constructs an initial automaton and then iteratively extends 
the automaton based on processing new data. Therefore the algorithm 
is well-suited for online and synthesis-in-the-loop applications. The core 
of the algorithm is a membership query that checks whether, within the 
specified tolerance, a given data set can result from the execution of a 
given automaton. We solve this membership problem for linear hybrid 
automata by repeated reachability computations. We demonstrate the 
effectiveness of the algorithm on synthetic data sets and on cardiac-cell 
measurements. 


Keywords: Synthesis - Linear hybrid automaton - Membership 


1 Introduction 


Natural sciences pursue to understand the mechanisms of real systems and to 
make this understanding accessible. Achieving these two goals requires observa- 
tion, analysis, and modeling of the system. Typically, physical components of a 
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system evolve continuously in real time, while the system may switch among a 
finite set of discrete states. This applies to cyber-physical systems but also to 
purely analog systems; e.g., an animal’s hunger affects its movement. A proper 
formalism for modeling such types of systems with mixed discrete-continuous 
behavior is a hybrid automaton [11]. Unlike black-box models such as neural 
networks, hybrid automata are easy to interpret by humans. However, designing 
such models is a time-intensive and error-prone process, usually conducted by 
an expert who analyzes the experimental data and makes decisions. 

In this paper, we propose two automatic approaches for synthesizing a linear 
hybrid automaton [1] from experimental data. The approaches provide two main 
properties. The first property is soundness, which ensures that the generated 
model has enough executions: these executions approximate the given data up to 
a predefined accuracy. The second property is precision, which ensures that 
the generated model does not have too many executions. The behavior of a 
hybrid automaton is constrained by so-called invariants and guards. Precision 
expresses that the boundaries of these invariants and guards are witnessed by 
the data, which indicates that the constraints cannot be made tighter. Moreover, 
the proposed synthesis algorithm is complete for a general class of linear hybrid 
automata, i.e., the algorithm can synthesize any given model from this class. 

The first approach reduces the synthesis problem to a satisfiability ques- 
tion for a linear-arithmetic formula. The formula allows us to encode a min- 
imality constraint (namely in the number of so-called modes) on the resulting 
model. This approach is, however, not scalable, which motivates our second app- 
roach. Our second approach follows an iterative model-adaptation scheme. Apart 
from scalability advantages, this online algorithm is thus also well-suited for 
synthesis-in-the-loop applications. 

After constructing an initial model, the second approach iteratively improves 
and expands the model by considering new experiments. After each iteration, the 
model will capture all behaviors exhibited in the previous experiments. Given 
an automaton and new experimental data, the algorithm proceeds as follows. 
First we ask whether the current automaton already captures the data. We 
pose this question as a membership query for a piecewise-linear function in the 
set of executions of the automaton. For the membership query, we present an 
algorithm based on reachability inside a tube around the function. If the data is 
not captured, we need to modify the automaton accordingly by adding behavior. 
We first try to relax the above-mentioned invariants and guards, which we reduce 
to another membership query. If that query is negative as well, we choose a path 
in the automaton that closely resembles the given data and then modify the 
automaton along that path by also adding new discrete structure (called modes 
and transitions). This modification step is again guided by membership queries 
to identify the aspects of the model that require improvement and expansion. 

As the main contributions, (1) we present an online algorithm for automatic 
synthesis of linear hybrid automata from data that is sound, i.e., guarantees 
that the generated model approximates the data up to a user-defined threshold, 
precise, i.e., the generated model is tight, and complete for a general class of 
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models (2) we solve the membership problem of a piecewise-linear function in a 
linear hybrid automaton. This is a critical step in our synthesis algorithm 


Related Work. The synthesis of hybrid systems was initially studied in control 
theory under the term identification, mainly focused on (discrete-time) switched 
autoregressive exogenous (SARX) and piecewise-affine autoregressive exogenous 
(PWARX) models [7,18]. SARX models constitute a subclass of linear hybrid 
automata with deterministic switching behavior. PWARX models are specific 
SARX models where the mode invariants form a state-space partition. Fixing 
the number of modes, the identification problem from input-output data can be 
solved algebraically by inferring template parameters. However, in contrast to 
linear hybrid automata, the lack of nondeterminism and the underlying assump- 
tion that there is no hidden state (mode) limits the applicability of these models. 
An algorithm by Bemporad et al. constructs a PWARX model that satisfies a 
global error bound [5]. Ozay presents an algorithm for SARX models where the 
switching is purely time-triggered [17]. There also exist a few online algorithms 
for the recursive synthesis of PWARX models based on pattern recognition [19] 
or lifting to a high-dimensional identification problem for ARX models [10,22]. 

Synthesis is also known as process mining, and as learning models from traces; 
the latter refers to approaches based on learning finite-state machines [3] or 
other machine-learning techniques. More recently, synthesis of hybrid automaton 
models has gained attention. All existing approaches that we are aware of have 
structural restrictions of some sort, which we describe below. We synthesize, 
for the first time, a general class of linear hybrid automata which (1) allows 
nondeterminism to capture many behaviors by a concise representation and 
(2) provides formal soundness and precision guarantees. The algorithm is also 
the first online synthesis approach for linear hybrid automata. 

The general synthesis problem for hybrid automata is hard: for deterministic 
timed automata (a subclass of linear hybrid automata with globally identical 
continuous dynamics), one may already require data of exponential length [21]. 
The approach by Niggemann et al. constructs an automaton with acyclic dis- 
crete structure [16], while the approach by Grosu et al., intended to model purely 
periodic behavior, constructs a cyclic-linear hybrid automaton whose discrete 
structure consists of a loop [8]. Ly and Lipson use symbolic regression to infer a 
non-linear hybrid automaton [14]. However, their model neither contains state 
variables (i.e., the model is purely input-driven, comparable to the SARX model) 
nor invariants, and the number of modes needs to be fixed in advance. Medhat 
et al. describe an abstract framework, based on heuristics, to learn linear hybrid 
automata from input/output traces [15]. They first employ Angluin’s algorithm 
for learning a finite-state machine [3], which serves as the discrete structure of the 
hybrid automaton, before they decorate the automaton with continuous dynam- 
ics. This strict separation inherently makes their approach offline. The work by 
Summerville et al. based on least-squares regression requires an exhaustive con- 
struction of all possible models for later optimizing a cost function over all of 
them [20]. Lamrani et al. learn a completely deterministic model with urgent 
transitions using ideas from information theory [12]. 
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2 Preliminaries 


Sets. Let R, Ryo, and N denote the set of real numbers, non-negative real num- 
bers, and natural numbers, respectively. We write x for points (x1, ..., £n) in R”. 
Let cpoly(n) be the set of compact and convex polyhedral sets over R”. A set 
X € cpoly(n) is characterized by its set of vertices vert(X). For a set of points 
Y, chull(Y) € cpoly(n) denotes the convex hull. Given a set X € cpoly(n) 
and € € Rso, we define the e-bloating of X as [X]- := {x € R” | Ixo E€ X: 
|x — Xo|| < €} € cpoly(n), where ||- || is the infinity norm. Given an interval 
I = |l,u] € cpoly(1), 1b(Z) = l and ub(Z) = u denote its lower and upper bound. 


Functions and Sequences. Given a function f, let dom( f) resp. img( f) denote its 
domain resp. image. Let f|a denote the restriction of f to domain A C don(f). 
We define a distance between functions f and g with the same domain and 
codomain by d(f,g) := max;eaomf) || f(t) — g(t)||. A sequence of length m is a 


function s : D — A over an ordered finite domain D = {i1,..., im} C N and 
a set A, and we write len(s) to denote the length of s. A sequence s is also 
represented by enumerating its elements, as in s(i1),...,5(im). 


Affine and Piecewise-Linear Functions. An affine piece is a function p : I — R” 
over an interval J = [to, ti] C R defined as p(t) = at +b where a,b € R”. Given 
an affine piece p, init(p) denotes the start point p(to), end(p) denotes the end 
point p(t,), and slope(p) denotes the slope a. We call two affine pieces p and 
p' adjacent if end(p) = init(p’) and ub(dom(p)) = 1b(dom(p’)). For m € N, an 
m-piecewise-linear (m-PWL) function f : I > R” over interval J = [0, T] CR 
consists of m affine pieces p1,...,Pm, Such that I = U1<j<mdom(p;), f(t) = p;(t) 
for t € dom(p;), and for every 1 < j < m we have end(p;_1) = init(p;). We 
show a 3-PWL function in Fig. 1 on the left. Let pieces( f) denote the set of affine 
pieces of f. We refer to f and the sequence p1,...,Dm interchangeably and write 
“PWL function” if m is clear from the context. A kink of a PWL function is the 
point between two adjacent pieces. Given a PWL function f : J — R” anda 
value € € Ryo, the e-tube of f is the function tubes. : I — cpoly(n) such that 


tubes e(t) = [f(t)]e. 


Graphs. A graph is a pair (V, E) of a finite set V and a relation E C V x V. 
A path m in (V, Æ) is a sequence v1,...,Um with (v;-1,v;) E E for 1 <j Sm. 


Hybrid Automata. We consider a particular class of hybrid automata [1,11]. 


Definition 1. A n-dimensional linear hybrid automaton (LHA) is a tuple H = 
(Q, E, X, Flow, Inv, Grd), where (1) Q is a finite set of modes, (2) EC Qx Q is a 
transition relation, (3) X = R” is the continuous state-space, (4) Flow: Q— R” 
is the flow function, (5) Inv: Q— cpoly(n) is the invariant function, and (6) 
Grd: E — cpoly(n) is the guard function 


We sometimes annotate the elements of LHA H by a subscript, as in Qu for 
the set of modes. We refer to (Qu, En) as the graph of LHA H. 

An LHA evolves continuously according to the flow function in each mode. 
The behavior starts in some mode q € Q and some continuous state x € Inv(q). 
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For every mode q € Q, the continuous evolution follows the differential equation 
x = Flow(q) while satisfying the invariant Inv(q). The behavior can switch from 
one mode q; to another mode q if there is a transition (q1, q2) € E and the guard 
Grd((q1,q2)) is satisfied. During a switch, the continuous state does not change. 
This type of system is sometimes called a switched linear hybrid system [13]. 


Definition 2. Given an n-dimensional LHA H = (Q, E, X, Flow, Inv, Grd), an 
execution o is a triple o = (Z,7,6), where T is a sequence of consecutive intervals 
[to, ta], [t1, tal, sey [tm—1, tm] with IZ] = peter lbs tigil and Ys [Z] — R” and 
ô: {1,..., M} — Q are functions with the following restrictions: 


- for alll <j < m, y(t) € Inv(o(j)) for t € T(j) and Y(t) = Flow(d(j)) for 
all t' in the interior of Z(j), i.e., y|z(j) is an affine function satisfying the 
invariant and following the flow, and 

- for alll <j < m, (6(9),6(j +1)) € E and y(t) € Grd((d(f), 6(7 + 1))) where 
t = ub(Z(J)), i.e., if a transition is taken, then the guard is satisfied. 


We denote the set of all executions of H by exec(H). Given an LHA H, we 
say that an execution o follows a path m in H, that is, in the graph (Qu, En), 


denoted as o “5 m, if len(Z) = len(m) and 6(j) = 7(j/) for every 0 < j < len(Z). 


From Time-series Data to PWL Functions. Experimental data typically comes 
as time series, i.e., data is only available at sampled points in time. A time series 
is a sampling s : D — R” over a finite time domain D C [0, T]. Since the LHA 
model features piecewise-linear executions, we focus on piecewise-linear approx- 
imation of the data. PWL functions can approximate any continuous behavior 
with arbitrary precision. There are different yet valid choices for approximating 
data. For a single time series, linear interpolation gives a perfect fit, but contains 
many kinks; other algorithms minimize the number of kinks for a given error 
bound [6,9]. One can preprocess multiple time series into a single PWL function 
using, e.g., linear regression. In this paper, we leave the choice of abstraction 
open and assume that the input is given as PWL functions. 


3 Synthesis of Linear Hybrid Automata 


In this section, we specify the synthesis problem, consider two different speci- 
fications, synchronous and asynchronous, and present the automated approach 
for solving the synchronous problem. The overall goal is to synthesize a linear 
hybrid automaton from a set of PWL functions such that the automaton captures 
the behavior described by each of the PWL functions up to a bound e. 


Definition 3 (Soundness). Given a PWL function f and a value € € Ryo, 
we say that an LHA H e-captures f if there exists an execution o = (Z,y,6) in 
exec(H) with d(f,y) < €. 


The value £ quantifies the acceptable deviation of an execution’s continuous 
function y from the PWL function f. For e = 0, y must precisely follow f. A 
straightforward formulation of the problem we want to solve is the following. 
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Problem 1 (Synthesis). Given a finite set of PWL functions F and € € Rso, 
construct an LHA H that e-captures every function f € F. 


Observe that this problem is not well-posed, as it can be satisfied by an 
automaton that exhibits an excessive amount of behavior. Hence our second 
goal for the synthesis algorithm is to ensure constraints on the automaton’s size. 
We start with the synthesis of an LHA with minimal number of modes. 


3.1 Synchronous Switching Specification 


For now, we require that the executions in the LHA switch synchronously with the 
given PWL functions. Under this assumption, we tackle a refinement of Problem 1: 


Problem 2 (Synchronous synthesis). Given a finite set of PWL functions F and 
a value £ € Ryo, construct an LHA H that e-captures every function f € F syn- 
chronously, and furthermore require that H has the minimal number of modes. 


In the following, we present an algorithm to solve Problem 2. The idea is, 
given a PWL function f, to synthesize an execution ø that is e-close to f. Recall 
that the continuous function y of an execution is essentially just another PWL 
function. Any LHA that contains the execution o has to comprise a mode for 
each different slope in y. Thus a minimal number of modes can be achieved by 
minimizing the number of different slopes in y. By fixing a number of different 
slopes, we encode the existence of y as a logical formula ¢y,-, which will be 
satisfiable if and only if there exists a suitable function y. 

Let m be the number of affine pieces p1, ..., Pm in f with dom(p,;) = [t;-1, tj] 
for 1 < j < m. We refer to the time instants t; as the switching times of f, 
and to x; = f(t;) as the switching points of f. Fixing a number £ € N, we 
want to construct a PWL function ye, consisting of m affine pieces p},..., pi, 
with £ different slopes, with the same switching times as in f, with switching 
points yo,.--,¥m €-close to those in f (which is necessary and sufficient for 
d( f, ye) <S £), and with unknown slopes bı = slope(p/),..., bm = slope(pi,,). 
We define the logical formula 


m m mo £ 
of e(0) = Ax = yj-1+ b;(t Ax E[xj]eA A V bj = ce, 


which is satisfiable if and only if there exists a suitable PWL function ye. For 
lifting to a set of functions F, we define the formula ¢¢,-(¢) := Aper by,<(4)- 
These formulae fall into the theory of linear arithmetic and can be effectively 
solved by an SMT solver. Now, we can state the following results. 


Lemma 1. Let F be a finite set of PWL functions and £ E€ Rso. If ọF (l) is 
satisfiable for some integer value £, then there exists a set of PWL functions F' 
such that |F’| = |F|, each function in F is e-close to some function in F', and 
the number of distinct slopes in F’ does not exceed £. 
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The set F’ can be extracted from a satisfying assignment. We define a hybrid 
automaton with minimal number of locations 0-capturing a given PWL function. 


Definition 4 (Canonical automaton). Let f be an n-PWL function. The 
canonical automaton of f is Hy := (Q, E,R”, Flow, Inv, Grd) with 


~ Q= {da | Ip E pieces(f) : slope(p) = a}, 

~ E= {(qa,qa') | 3p, p' € pteces(f) adjacent : slope(p) = a, slope(p') = a}, 
Flow(da) = a, 

Inv(qa) = chull ({img(p) | p E pieces(f) : slope(p) = a}), and 

- Grd((da;a’)) = chull({end(p) | Ip, p' E€ pieces(f) adjacent : slope(p) = a, 


slope(p') = a'}). 


Lemma 2. Given a PWL function f, the canonical automaton Hy O-captures f, 
and every LHA that 0-captures f has at least as many modes as Hp. 


Definition 5 (Merging). Given two hybrid automata H; = (Q;, Ei, X, Flow, 
Invi, Grdi), i = 1,2 with Qi N Qo = 0, let Q, = QIW U Qe be the locations 
with flow equal to a. We define the merging of Hiand Hz as Hı U Ho := 
(Q, E, X, Flow, Inv, Grd) with Q = {qa | a € R”, Qa £ 0}, E = {(das qa’) | 
(q,q') € Fi U E2,q E€ Qa,q € QL}, Flow(qa) = a, Inv(qa) = chull({Inv;(q) | 
q € Qa,t = 1,2}), and Grd((qa,qx)) = chull({Grd;((q,¢')) | (qd) E€ Ei, 
q Qa, d Qasi = 1,2}). 


Theorem 1. Given a finite set of PWL functions F and a value € € Ryo, let 
L be the smallest integer such that Ff (£L) is satisfiable and let F' be a set of 
PWL functions corresponding to a satisfying assignment. Then, the merging of 
canonical automata Urex/ Hy solves Problem 2. 


The above synthesis algorithm works well with short and low-dimensional 
PWL functions but does not scale to realistic problem sizes due to the heavy use 
of disjunctions. We next address scalability with a new online algorithm. 


3.2 Asynchronous Switching Specification 


We now change the requirement from the previous subsection (minimality in the 
models’ discrete structure) to tightness in the model’s state-space constraints. 
Intuitively, for every vertex v of an invariant or guard in H there should be some 
witness data f € F that is close to v (at some point in time). 


Definition 6 (Precision). Given an LHA H = (Q,E,X, Flow, Inv, Grd), let 
vert(H) denote the union of the vertices of the invariants and guards: 


vert(H) = U vert(Inv(q)) U U vert(Grd(e)) 
qEQ ecE 


Given a set of PWL functions F and a value £ € Ryo, we say that H is £ -precise 
(with respect to F) if the following holds: 


Vue vert(H) If € F at € dom( f) : |v — f(t)|| < e. 
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The restriction to the vertices is reasonable because all sets are compact 
convex polyhedra. Note that -capturing compares functions to the automaton’s 
executions, while ¢-precision compares functions to the automaton’s state-space. 

We also relax the limitation to synchronously switching executions. Instead, 
we allow asynchronous switching, characterized as follows: for every function f 
e-captured by H, there exists an execution 0 € exec(H) with the same number 
of switches as there are kinks in f, i.e., len(Z) = |pieces(f)|, and where the 
j-th switch in the execution should take place during the time period between 
the kinks j — 1 and 7 +1. We close this section with the new problem statement 
(a refinement of Problem 1), and present a solution in the next section. 


Problem 3 (Asynchronous synthesis). Given a finite set of PWL functions F and 
a value € € Ryo, construct an e-precise LHA H that e-captures every function 
f € F asynchronously. 


4 Membership-based Synthesis Approach 


In this section, we present an algorithm for solving Problem 3. The core of 
the algorithm is a reachability computation for providing the polyhedral regions 
where executions of an LHA that are e-close to a given PWL function f are allowed 
to switch. More precisely, given a path m and the e-tube of f, the algorithm 
iteratively constructs the set inside the e-tube where an execution following 7 
can switch, without escaping from the tube. These reachable set are, in general, 
computed with respect to a starting compact convex polyhedron P, a pair of 
adjacent affine pieces p and p’, and a pair of modes q and q’ along r. 


Definition 7. Given an LHA H = (Q, E, X, Flow, Inv, Grd) and a value £ € Ryo, 
a reachable switching set switchy(P, p, p',q,q') from a set P with respect to two 
adjacent affine pieces p,p' and a path m :=q,q' in H is defined as 


{x € Grd((q,q')) | do = = 7,0) € exec(H): a a T, dom(y) = dom(p) U dom(p’) 
(0) E P, y(t) E tube, e(t) U tube, e(t), and x = y(ub(Z(0)))}. 


? 


Inductive Reachable Switching Computation. Given an LHA H, an m-PWL 
function f = p1,...,Pm,a value £ € Ryo and a path 7 = q1,.- . , qm in the graph 
(Qu, En), we compute the reachable switching set P7 for every 0 < j < m: 


- P§ := Invya(a)n tubes, (0), 

— PF := switch (PF 1, Pj-1, Pj, qj—1,4j) for 1 < j < m, and 

- P7 := {x€ Inulan) | do = (T, 7,6) € exec(H) : 0 X am, (0) € P3 is» 
don(y) = dom(pm), Y(t) € tubep, e(t) and x = y(ub(T(m)))}: 


We denote the set of all reachable switching sets P7 by P”. We are now ready 
to present the complete synthesis algorithm. 


Membership-Based Synthesis of Linear Hybrid Automata 305 


Algorithm 1. SYNTHESIS 


Input: A set of PWL functions F = {fo,..., fw} and a value € € R>o 
Output: A linear hybrid automaton H that solves Problem 3 
1: H := INITLHA(fo, €) > construct initial model for e-capturing fo 


2: for f € F \ {fo} do 
(ans, T) := MEMBERSHIP(f,H,€) 
if not ans then 

H := RELAXALL(H, f, £) > relax model constraints entirely 

(ans, 7) := MEMBERSHIP( f, H, €) 

if ans then 

H := RELAXPATH(H, f,é,7) D relax model constraints for e-capturing f 

9: else 
10: H := ADAPT(H, f,€,7) > adapt model for -capturing f 
11: return H 


4.1 Membership-based Synthesis Algorithm 


The synthesis algorithm outlined in Algorithm 1 computes an LHA # solving 
Problem 3 for a given finite set of PWL functions F and a value £ € Ryo. The 
algorithm initially infers an LHA # that e-captures the first function fo of F in 
an €-precise manner in line 1. The remaining PWL functions are handled in an 
iterative loop. For each PWL function f, the algorithm performs a membership 
query, where it checks if f is e-captured by the LHA # in line 3. If the query 
results in a positive answer (ans = True), nothing needs to be done. Otherwise, 
the query returns a path m and the LHA H needs to be modified. The modi- 
fication of the automaton H is performed in two attempts. The first attempt, 
in line 5, temporarily increases invariants and guards of H. If such a modifi- 
cation is sufficient to let the membership query succeed, the modifications are 
made permanent in line 8. Otherwise, in the second attempt the algorithm adds 
new modes and/or transitions to H along the path 7. Below we describe every 
procedure of Algorithm 1 in detail. 


Initialization. The procedure INITLHA(f,¢) constructs an initial LHA H that 
e-captures an m-PWL function f. Observe that by Lemma 2 the canonical 
automaton Hy O-captures (and hence e-captures) the function f. In order to 
allow similar dynamical behaviors in a given LHA H, the procedure INITLHA(f, €) 
e-bloats both invariant and guards polyhedra. The procedure INITLHA(f, €) out- 
puts the e-bloated canonical automaton H$ and is illustrated in Fig. 1. 


Definition 8. Given an LHA H = (Q,E,X, Flow, Inv, Grd), we define the 
e€ -bloated LHA of H as HE = (Q, E,X, Flow, Inv’, Grd*) where Inv*(q) = 
[Inv(q)]- for every q E€ Q and Grd*(e) = | Grd(e)]- for every e € E. 


Lemma 3. Given a PWL function f and € € Ryo, HẸ €-captures f. 
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tE [[2, 2]le 
ae fike x € [[2,2]]e 


x € [[2,2]]- 


Fig. 1. Example describing the procedure INITLHA(f, £) for a 3-PWL function f = fo 
(depicted on the left). The function fo consists of three pieces po, pı, p2 with slopes 
1,0, 1, respectively. The LHA on the right is constructed as follows. Mode qo corresponds 
to pieces po and p2; the invariant is the ¢-bloating of interval [1, 3] (which is the convex 
hull of every start and end point in both pieces). Likewise, mode qi corresponds to 
piece pı. Transitions and their guards correspond to the kinks of fo at t = 1 and t = 2. 


Membership. The procedure MEMBERSHIP(f,H,¢) checks whether there 
exists an asynchronous execution o = (Z,7y,6) in H such that d(f,y) < £ holds. 
Let us introduce the required notions to formalize the membership problem. 


Definition 9. An execution o = (Z,y,0) of an LHA H is consistent with an 
m-PWL function f, described by the affine pieces pi,...,Pm, if len(Z) = m, 
[Z] = dom( f), and ub(Z(j)) € dom(p;) U dom(p;41) for every 1 S j < m. 


Problem 4 (Membership). Given an m-PWL function f, an LHA H, and a value 
€ € Ryo, decide if there exists an execution o = (Z,7,6) in exec(H) that is 
consistent with f and such that d(f,y) < £ holds. 


The procedure MEMBERSHIP(f,H,¢) solves Problem 4 by computing the 
reachable switching sets for every path 7 of length m in H until finding a path m 
where every reachable switching set P7 for 0 < j < mis nonempty. Upon finding 
a path 7 satisfying the previous constraints, MEMBERSHIP(f,H,¢) returns True 
as answer, together with the path r. If there does not exist such a path 7, it 
returns False as answer. We show an example in Fig. 2(a). We remark that, for a 
fixed path, Problem 4 is a timestamp-generation problem [2] with the restriction 
to time intervals for switching and the e-tube as solution corridor. 


Lemma 4. Let H be an LHA and f be an m-PWL function. Then there exists 
a path n of length m in H such that the final reachable switching set P7, is not 
empty if and only if there exists an execution o in exec(H) solving Problem 4. 


Relaxation. If MEMBERSHIP(f,H,¢) returns False, RELAXALL(H, f,€) con- 
structs an automaton H that is equivalent to H except that its invariants and 
guards are enlarged to allow additional executions inside the tubes e. Then, the 
algorithm computes MEMBERSHIP(f, H, £). If the answer is False again, the algo- 
rithm proceeds to the adaptation procedure in line 10. Otherwise (if the answer 
is True), we obtain a path 7 in H. Then the algorithm executes the procedure 
RELAXPATH(H, f,¢,7), which extends the constraints of invariants and guards 
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Fig. 2. (a) Example describing the procedure MEMBERSHIP(f,H,¢). On the left we 
depict a 3-PWL function fı and its e-tube. On the right we show a possible execution 
in the LHA from Fig. 1. (b) Given an affine piece p, we say that another piece has a 
similar slope if it does not leave the tube. In the figure, we show the minimal and the 
maximal allowed slopes by dashed segments. 


in H for the modes in 7 by taking the convex hull with the corresponding reach- 
able switching sets P7 € P7. The relaxation procedure applied on the running 
example is shown in Fig. 3. 


Adaptation. If both the membership query and the relaxation procedure fail, 
the procedure ADAPT(H, f,¢,7) modifies the LHA H for ¢-capturing f. Con- 
ceptually, we construct a new path 7’, based on some path 7, and modify H 
accordingly such that the graph of H contains 7’. Recalling Lemma 4, we need 
to ensure that every reachable switching set in P™ is nonempty. We construct 7’ 
by trying to preserve the modes in path 7. If this is not possible, we try to replace 
them by existing modes in the LHA H whenever possible, potentially adding new 
transitions. The last option is to create new modes. Finally, we extend the LHA 
H by adding the new transitions and/or modes determined by the new path 7’. 

In more detail, given an LHA H, an m-PWL function f and a path 7 = 
di;-++;Qm in H, we start with path 7’ = m. Then, the adaptation procedure 
checks whether there is an empty reachable switching set in pr. Every time we 
detect emptiness of the set pr for some 0 < j < m, a mode in the path 7’ is 


replaced in order to make Pr nonempty. We first try to replace the mode qj+41 
if it exists. If Př is still empty or qj+1ı does not exist, we repeat the replacement 


for qj, Gi—1, and so on, until pr finally becomes nonempty. 

For the replacement of the j-th mode q in the path 7’ we follow two strategies. 
The first strategy is to replace the mode q by an existing mode q’ Æ q in H such 
that Flow;,(q') is similar to slope(p;). Formally, let T be the duration of piece 
pj. Flowy(q’) is similar to slope(p,) if ||init(p;)+T-Flow;,(q')—end(p,;)|| < 2e. 
See Fig. 2(b) for an example. If the first strategy fails, the second strategy is to 
create a new mode q* with flow newflow(q*) = slope(p;) for replacement in 7’. 
We denote the set of existing modes similar to some mode q in m by sim(7’), 
and the set of new modes q* by new(z’). Once the path 7’ is constructed, the 
adaptation of the LHA H is performed with respect to 7’. Figure 4 exemplifies 
the adaptation of the LHA in Fig. 1. 
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1 me [[1,3]le x € [[1,2]]e 


0 1—t x € [[1,2]]e 


Fig. 3. Example describing the procedure RELAXPATH(H, f,€, 7) for H given in Fig. 1, 
f = fe (depicted on the left), and path m = qi,qo,q1. The algorithm increases the 
invariant of mode qı by computing the convex hull of the old invariant [[2,2]]- and 
the set [[1,1]]-. Analogously, the guard of the transition (qi, qo) is increased. 


Definition 10. The adaptation of the LHA H = (Q, E, X, Flow, Inv, Grd) with 
respect to an m-PWL function f with affine pieces pı,...,Pm and a path t = 
Qi;--+;Qm ts the LHA H' = (Q’, E', X, Flow’, Inv’, Grd’) defined as: 


Q' := QUnew(n’), 
= = BUT (aaa) | 1 Sj < m}, 


- Flow'(q) := l 


newflow(q) if q€new(n’), 


Flow(q) otherwise, 
chull(U,_4, 4 di Pr 1U ici P7’) if q € new(n’), 
- Inv'(q) := § chull(Inv(q) U OPERE Pri SIOP PE) if q € sim(n’), 
Inv(q) otherwise, 
(eee ee ee Pr) if q € new(n’) 


or q! €new(7’), 
Grd'((q, q')) := ¢ chull(Grd((q,q')) U Ce are Pe) if q E€ sim(z’) 
or q € sim(z’), 


Grd((q, q’)) otherwise. 


If there is no path of length m in the graph of H, we choose a shorter path 7 
in H of length m’ for the adaptation procedure. Then, for every position j > m’, 
we define the reachable switching set P7 as an empty set and proceed as usual. 


4.2 Discussion 


The construction of the initial LHA (line 1 in Algorithm 1) can be modified 
to clustering pieces with similar slopes. This can help reducing the number of 
modes in the initial automaton, but does not guarantee that the first PWL func- 
tion fo is e-captured. To fix this, fp can be included in the loop of Algorithm 1. 

Algorithm 1 follows a local repair strategy, based on a single PWL function. 
Thanks to this, the algorithm can be used in an online setting where new data 
arrives after the algorithm has started. However, the resulting model is influenced 
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> a € [[2,2]]- a € [[L.5, 1.5] ]e 
1 r, G [1,3]. x € [[0.5, 2]. x € [(0.5, 1.5]]- 


t x € [[L, 2]]e x € [[0.5,0.5]]e 


> 
0 1 2 3 


Fig. 4. Example describing the procedure ADAPT(H, f,7,¢) for the LHA H in Fig. 1 
with respect to the 3-PWL function f = fs and the path 7 = q1, go, qi and e = 0.25. The 
initial reachable switching set Py’ is the projection of the set P on state x. Considering 
the flows in qı and qo, the next reachable switching set P7 is the projection of the set 
Q on state x. Observe that from Q, using the flow of qi, the reachable switching set 
PS is empty. We thus add a new mode q* and obtain the new path 7’ = qi, q*, qı. 


by the order in which the algorithm processes the functions f € F. In the 
simple case that F only contains affine functions with the same slope, all models 
resulting from different processing orders will consist of a single mode with the 
same flow, and the invariant bounds differ by at most e. Furthermore, for a 
precision value £ = 0, the result is always order-independent. 

We now discuss the restrictions of the models we obtain from Algorithm 1. 
We did not include a set of initial states in our presentation, but the gener- 
alization is straightforward. Our transitions do not include assignments, which 
would make executions discontinuous. The usual assumption in many applica- 
tion domains, e.g., life sciences, is that the underlying system is continuous, so 
having assignments would not be desirable. In the setting where the input is 
given as time-series data, discrete events would typically be approximated by 
steep slopes in the PWL function. In the setting where the input is given as dis- 
continuous PWL functions f, in order to -capture f, one would generally require 
that the automaton switches synchronously with f (cf. Sect. 3.1), instead of asyn- 
chronous switching as in our algorithm. Under this additional assumption, we 
can pose the procedures MEMBERSHIP and RELAXPATH as a single linear pro- 
gram (similar to formula ¢;,-). This linear program can also be used to identify 
assignments. 

The continuous dynamics of our models are defined by constant differential 
equations. As mentioned before, this class generally suffices to approximate an 
arbitrary continuous function (by increasing the number of modes). An exten- 
sion of our approach to use polyhedral differential inclusions (also called linear 
envelopes) is by merging modes of “similar” dynamics. This may, however, lead 
to the dilemma that several modes are equally similar. 


4.3 Theoretical Properties of the Membership-based Synthesis 


The following theorem asserts that Algorithm 1 solves Problem 3. 
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Theorem 2 (Soundness and precision). Given a finite set of PWL functions 
F and a value € € Ryo, let H be an automaton resulting from SYNTHESIS(F, €). 
Then H both ¢-captures all functions in F and is e-precise with respect to F. 


Algorithm 1 satisfies a completeness property in the following sense. For every 
model H from a certain class we can find a set F of PWL functions and a value 
€ such that SYNTHESIS(F, e€) results in H. Before we can characterize the class 
of models, we first need to introduce some terminology. 


Definition 11. Let q E€ Q be a mode with invariant X = Inv(q) and flow 
Flow(q). We call a continuous state x2 € X forward reachable in q if there 
is a continuous state x, E X such that xq is reachable from xı by just letting 
time pass, i.e., St > 0: X2 = Xı + Flow(q)-t. Analogously, we call state x E€ X 
backward reachable in q if there is a state x; E€ X such that x2 is reachable 
from xı. A continuous state is dead in q if it is neither forward reachable nor 
backward reachable in q. 


We characterize the class of automata H = (Q, E, X, Flow, Inv, Grd) for which 
the algorithm is complete by considering the following assumptions: (1) no invari- 
ant contains a dead continuous state. Furthermore, if e = (qi, gz) is a transition, 
then all continuous states in the guard Grd(e) are forward reachable in qı and 
backward reachable in q2, and (2) no two modes have the same slope 

Roughly speaking, Assumption (1) asserts that, after every switch, an exe- 
cution can stay in the new mode for a positive amount of time. 


Theorem 3 (Completeness). Given an LHA H satisfying Assumptions (1) 
and (2), there exist PWL functions F such that SYNTHESIS(F,0) results in H. 


5 Experimental Results 


In this section, we present the experiments used to evaluate our algorithm. The 
algorithm was implemented in Python and relies on the standard scientific com- 
putation packages. For the computations involving polyhedra we used the pplpy 
wrapper to the Parma Polyhedra Library [4]. 


Case Study: Online Synthesis. We evaluate the precision of our algorithm by 
collecting data from the executions of existing linear hybrid automata. For each 
given automaton, we randomly sample ten executions and pass them to our algo- 
rithm, which then constructs a new model. After that, we run our algorithm with 
another 90 executions, but we reuse the intermediate model, thus demonstrating 
the online feature of the algorithm. We show the different models for two hand- 
crafted examples in Table1. We tried both sampling from random states and 
from a fixed state. The examples show the latter case, which makes sampling 
the complete state-space and thus learning a precise model harder. 

The first example contains a sink with two incoming transitions, which 
requires at least two simulations to observe both transitions. Consequently, the 
algorithm had to make use of the adaptation step at least once to add one of the 
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Table 1. Synthesis results for two automaton models. The original model is shown in 
blue. The synthesis result after 10 iterations is shown in bright red, and after another 90 
iterations in dark red. On the bottom left we show three sample executions starting from 
the same point (top: original model, bottom: synthesized model after 100 iterations). 
We used £ = 0.2 in all cases. Numbers are rounded to two places. 


x € [1.58, 9.02] 


F € [5.11, 9.80] 
x € (0.58, 5.06] 
al gl ie 
is 
oo 5 10 15 20 TE (5.86, 7.13] Ze (4.97, (ells) DE (4.97, 6.02] 


; ; T € [4.85, 9.87] 
o A 


z € [0,10] | z € x € [0.43, 9.87] |x € [0.43, 1.31] 4 z € [0.43, 9.87] 
; x € [4.02, 6.03] 


¿= n =l 


n © € [4.85, 10.18] 
T5 
5.0 x € [—0.10, 9.87] |x € [—0.10, 2.12 
2.5 


0.0 
10.0 


ae 
5.0 
2.5 
0.05 


x € [—0.09, 9.32] 


transitions. In the second example, some parts of the state-space are explored 
less frequently by the sampled executions. Hence the first model obtained after 
ten iterations does not represent all behavior of the original model yet. After 
the additional 90 iterations, the remaining parts of the state space have been 
visited, which is reflected in the precise bounds of the resulting model. In the 
table, we also show three sample executions from both the original and the final 
synthesized automaton to illustrate the similarity in the dynamical behavior. 
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Fig. 5. Results for the cell model. Top: synthesized model using our algorithm. Bottom: 
three input traces (left) and random simulations of the synthesized model (right). 


Case Study: Cell Model. For our case study we synthesize a hybrid automaton 
from voltage traces of excitable cells. Excitable cells are an important class of 
cells comprising neurons, cardiac cells, and other muscle cells. The main property 
of excitable cells is that they exhibit electrical activity which in the case of 
neurons enables signal transmission and in the case of muscle cells allows them 
to contract. The excitation signal usually follows distinct dynamics called action 
potential. Grosu et al. construct a cyclic-linear hybrid automaton from action- 
potential traces of cardiac cells [8]. In their model they identify six modes, two 
of which exhibit the same dynamics and are just used to model an input signal. 

Our algorithm successfully synthesizes a model, depicted in Fig. 5, consisting 
of five modes that roughly match the normal phases of an action potential. We 
evaluate the quality of the synthesized model by simulating random executions 
and visually comparing to the original data (see the bottom of Fig. 5). 


6 Conclusion 


In this paper we have presented two fully automatic approaches to synthesize a 
linear hybrid automaton from data. As key features, the synthesized automaton 
captures the data up to a user-defined bound and is tight. Moreover, the online 
feature of the membership-based approach allows to combine the approach with 
alternative synthesis techniques, e.g., for constructing initial models. 

A future line of work is to design a methodology for identification of weak 
generalizations in the model, and use them for driving the experiments and, in 
consequence, adjusting the model. We would first synthesize a model as before, 
but then identify the aspects of the model that are least substantiated by the 
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data (e.g., areas in the state space or specific sequences in the executions). Then 
we would query the system for data about those aspects, and repair the model 
accordingly. As another line of work, we plan to extend the approach to go 
from dynamics defined by piecewise-constant differential equations toward linear 
envelopes. Our approach can be seen as a generalization, to LHA, of Angluin’s 
algorithm for constructing a finite-state machine from finite traces [3], and we 
plan to pursue this connection further. 


References 


10. 


11. 


Alur, R., Courcoubetis, C., Henzinger, T.A., Ho, P.-H.: Hybrid automata: an 
algorithmic approach to the specification and verification of hybrid systems. In: 
Grossman, R.L., Nerode, A., Ravn, A.P., Rischel, H. (eds.) HS 1991-1992. LNCS, 
vol. 736, pp. 209-229. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540- 
57318-6_30 

Alur, R., Kurshan, R.P., Viswanathan, M.: Membership questions for timed and 
hybrid automata. In: RTSS, pp. 254-263. IEEE Computer Society (1998). https:// 
doi.org/10.1109/REAL.1998.739751 

Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 
75(2), 87-106 (1987). https://doi.org/10.1016/0890-5401(87)90052-6 

Bagnara, R., Hill, P.M., Zaffanella, E.: The Parma Polyhedra Library: toward a 
complete set of numerical abstractions for the analysis and verification of hardware 
and software systems. Sci. Comput. Program. 72(1—2), 3-21 (2008). https://doi. 
org/10.1016/j.scico.2007.08.001 

Bemporad, A., Garulli, A., Paoletti, S., Vicino, A.: A bounded-error approach to 
piecewise affine system identification. IEEE Trans. Autom. Control 50(10), 1567- 
1580 (2005). https: //doi.org/10.1109/TAC.2005.856667 

Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points 
required to represent a digitized line or its caricature. Cartographica 10(2), 112- 
122 (1973) 

Garulli, A., Paoletti, S., Vicino, A.: A survey on switched and piecewise affine 
system identification. IFAC Proc. Vol. 45(16), 344-355 (2012). https: //doi.org/10. 
3182/20120711-3-BE-2027.00332 

Grosu, R., Mitra, S., Ye, P., Entcheva, E., Ramakrishnan, I.V., Smolka, S.A.: 
Learning cycle-linear hybrid automata for excitable cells. In: Bemporad, A., Bic- 
chi, A., Buttazzo, G. (eds.) HSCC 2007. LNCS, vol. 4416, pp. 245-258. Springer, 
Heidelberg (2007). https://doi.org/10.1007/978-3-540-71493-4_21 

Hakimi, S.L., Schmeichel, E.F.: Fitting polygonal functions to a set of points in 
the plane. CVGIP Graph. Model. Image Process. 53(2), 132-136 (1991). https:// 
doi.org/10.1016/1049-9652(91)90056-P 

Hashambhoy, Y., Vidal, R.: Recursive identification of switched ARX models with 
unknown number of models and unknown orders. In: CDC, pp. 6115-6121 (2005). 
https: //doi.org/10.1109/CDC.2005.1583140 

Henzinger, T.A.: The theory of hybrid automata. In: Inan, M.K., Kurshan, R.P. 
(eds.) Verification of Digital and Hybrid Systems. NATO ASI Series (Series 
F: Computer and Systems Sciences), vol. 170, pp. 265-292. Springer, Berlin, 
Heidelberg (2000). https://doi.org/10.1007/978-3-642-59615-5_13 


314 M. Garcia Soto et al. 


12. Lamrani, I., Banerjee, A., Gupta, S.K.S.: HyMn: mining linear hybrid automata 
from input output traces of cyber-physical systems. In: ICPS, pp. 264-269. IEEE 
(2018). https: //doi.org/10.1109/ICPHYS.2018.8387670 

13. Liberzon, D.: Switching in Systems and Control. Birkhauser, Boston (2003). 
https: //doi.org/10.1007/978-1-4612-0017-8 

14. Ly, D.L., Lipson, H.: Learning symbolic representations of hybrid dynamical sys- 
tems. JMLR 13, 3585-3618 (2012). http://dl.acm.org/citation.cfm?id=2503356 

15. Medhat, R., Ramesh, S., Bonakdarpour, B., Fischmeister, S.: A framework for 
mining hybrid automata from input/output traces. In: EMSOFT, pp. 177-186. 
IEEE (2015). https://doi.org/10.1109/EMSOFT.2015.7318273 

16. Niggemann, O., Stein, B., Vodencarevic, A., Maier, A., Kleine Biining, H.: Learning 
behavior models for hybrid timed systems. In: AAAI. AAAI Press (2012). http:// 
www.aaai.org/ocs/index.php/AAAT/AAAI12/paper/view/4993 

17. Ozay, N.: An exact and efficient algorithm for segmentation of ARX models. In: 
ACC, pp. 38-41. IEEE (2016). https: //doi.org/10.1109/ACC.2016.7524888 

18. Paoletti, S., Juloski, A.L., Ferrari-Trecate, G., Vidal, R.: Identification of hybrid 
systems: a tutorial. Eur. J. Control 13(2-3), 242-260 (2007). https://doi.org/10. 
3166 /ejc.13.242-260 

19. Skeppstedt, A., Lennart, L., Millnert, M.: Construction of composite models from 
observed data. Int. J. Control 55(1), 141-152 (1992). https://doi.org/10.1080/ 
00207179208934230 

20. Summerville, A., Osborn, J.C., Mateas, M.: CHARDA: causal hybrid automata 
recovery via dynamic analysis. In: IJCAI, pp. 2800-2806. ijcai.org (2017). https:// 
doi.org/10.24963 /ijcai.2017/390 

21. Verwer, S.: Efficient identification of timed automata: theory and practice. Ph.D. 
thesis, Delft University of Technology, Netherlands (2010). http://resolver.tudelft. 
nl/uuid:61d9f199-7b01-45be-aGed-04498113a212 

22. Vidal, R., Anderson, B.D.O.: Recursive identification of switched ARX hybrid 
models: exponential convergence and persistence of excitation. In: CDC, vol. 1, 
pp. 32-37 (2004). https://doi.org/10.1109/CDC.2004.1428602 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


Overfitting in Synthesis: Theory 
and Practice 


Saswat Padhi!®), Todd Millstein!, Aditya Nori?, and Rahul Sharma? 


1 University of California, Los Angeles, USA 
{padhi,todd}@cs.ucla.edu 
2 Microsoft Research, Cambridge, UK 
adityan@microsoft.com 
3 Microsoft Research, Bengaluru, India 
rahsha@microsoft.com 


Abstract. In syntax-guided synthesis (SyGuS), a synthesizer’s goal is 
to automatically generate a program belonging to a grammar of possi- 
ble implementations that meets a logical specification. We investigate 
a common limitation across state-of-the-art SyGuS tools that perform 
counterexample-guided inductive synthesis (CEGIS). We empirically 
observe that as the expressiveness of the provided grammar increases, 
the performance of these tools degrades significantly. 

We claim that this degradation is not only due to a larger search 
space, but also due to overfitting. We formally define this phenomenon 
and prove no-free-lunch theorems for SyGuS, which reveal a fundamental 
tradeoff between synthesizer performance and grammar expressiveness. 

A standard approach to mitigate overfitting in machine learning is to 
run multiple learners with varying expressiveness in parallel. We demon- 
strate that this insight can immediately benefit existing SyGuS tools. 
We also propose a novel single-threaded technique called hybrid enumer- 
ation that interleaves different grammars and outperforms the winner 
of the 2018 SyGuS competition (Inv track), solving more problems and 
achieving a 5x mean speedup. 


1 Introduction 


The syntax-guided synthesis (SyGuS) framework [3] provides a unified format to 
describe a program synthesis problem by supplying (1) a logical specification for 
the desired functionality, and (2) a grammar of allowed implementations. Given 
these two inputs, a SyGuS tool searches through the programs that are permitted 
by the grammar to generate one that meets the specification. Today, SyGuS is at 
the core of several state-of-the-art program synthesizers [5,14,23,24,29], many 
of which compete annually in the SyGuS competition [1,4]. 

We demonstrate empirically that five state-of-the-art SyGuS tools are very 
sensitive to the choice of grammar. Increasing grammar expressiveness allows the 
tools to solve some problems that are unsolvable with less-expressive grammars. 
However, it also causes them to fail on many problems that the tools are able 
to solve with a less expressive grammar. We analyze the latter behavior both 
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theoretically and empirically and present techniques that make existing tools 
much more robust in the face of increasing grammar expressiveness. 

We restrict our investigation to a widely used approach [6] to SyGuS called 
counterecample-guided inductive synthesis (CEGIS) [37, §5]. In this approach, 
the synthesizer is composed of a learner and an oracle. The learner iteratively 
identifies a candidate program that is consistent with a given set of examples (ini- 
tially empty) and queries the oracle to either prove that the program is correct, 
i.e., meets the given specification, or obtain a counterexample that demonstrates 
that the program does not meet the specification. The counterexample is added 
to the set of examples for the next iteration. The iterations continue until a 
correct program is found or resource/time budgets are exhausted. 


Overfitting. To better understand the observed performance degradation, we 
instrumented one of these SyGuS tools (Sect. 2.2). We empirically observe that 
for a large number of problems, the performance degradation on increasing gram- 
mar expressiveness is often accompanied by a significant increase in the number 
of counterexamples required. Intuitively, as grammar expressiveness increases so 
does the number of spurious candidate programs, which satisfy a given set of 
examples but violate the specification. If the learner picks such a candidate, then 
the oracle generates a counterexample, the learner searches again, and so on. 

In other words, increasing grammar expressiveness increases the chances for 
overfitting, a well-known phenomenon in machine learning (ML). Overfitting 
occurs when a learned function explains a given set of observations but does not 
generalize correctly beyond it. Since SyGuS is indeed a form of function learning, 
it is perhaps not surprising that it is prone to overfitting. However, we identify 
its specific source in the context of SyGuS—the spurious candidates induced by 
increasing grammar expressiveness—and show that it is a significant problem 
in practice. We formally define the potential for overfitting (Q), in Definition 7, 
which captures the number of spurious candidates. 


No Free Lunch. In the ML community, this tradeoff between expressiveness 
and overfitting has been formalized for various settings as no-free-lunch (NFL) 
theorems [34, §5.1]. Intuitively such a theorem says that for every learner there 
exists a function that cannot be efficiently learned, where efficiency is defined by 
the number of examples required. We have proven corresponding NFL theorems 
for the CEGIS-based SyGuS setting (Theorems 1 and 2). 

A key difference between the ML and SyGuS settings is the notion of 
m-learnability. In the ML setting, the learned function may differ from the true 
function, as long as this difference (expressed as an error probability) is rela- 
tively small. However, because the learner is allowed to make errors, it is in turn 
required to learn given an arbitrary set of m examples (drawn from some dis- 
tribution). In contrast, the SyGuS learning setting is all-or-nothing—either the 
tool synthesizes a program that meets the given specification or it fails. There- 
fore, it would be overly strong to require the learner to handle an arbitrary set 
of examples. 


Overfitting in Synthesis: Theory and Practice 317 


Instead, we define a much weaker notion of m-learnability for SyGuS, which 
only requires that there exist a set of m examples for which the learner succeeds. 
Yet, our NFL theorem shows that even this weak notion of learnability can always 
be thwarted: given an integer m > 0 and an expressive enough (as a function 
of m) grammar, for every learner there exists a SyGuS problem that cannot be 
learned without access to more than m examples. We also prove that overfitting 
is inevitable with an expressive enough grammar (Theorems 3 and 4) and that 
the potential for overfitting increases with grammar expressiveness (Theorem 5). 


Mitigating Overfitting. Inspired by ensemble methods [13] in ML, which aggre- 
gate results from multiple learners to combat overfitting (and underfitting), we 
propose PLEARN—a black-box framework that runs multiple parallel instances 
of a SyGuS tool with different grammars. Although prior SyGuS tools run mul- 
tiple instances of learners with different random seeds [7,20], to our knowledge, 
this is the first proposal to explore multiple grammars as a means to improve 
the performance of SyGuS. Our experiments indicate that PLEARN significantly 
improves the performance of five state-of-the-art SyGuS tools—CVC4 [7,33], 
EUSOLVER [5], LOOPINVGEN [29], SKETCHAC [20,37], and STOCH [8, IIF]. 

However, running parallel instances of a synthesizer is computationally 
expensive. Hence, we also devise a white-box approach, called hybrid enumera- 
tion, that extends the enumerative synthesis technique [2] to efficiently interleave 
exploration of multiple grammars in a single SyGuS instance. We implement 
hybrid enumeration within LoopINVGEN! and show that the resulting single- 
threaded learner, LOOPINVGEN+HE, has negligible overhead but achieves per- 
formance comparable to that of PLEARN for LOOPINVGEN. Moreover, LOOPIN- 
vGEN-+HE significantly outperforms the winner [28] of the invariant-synthesis 
(Inv) track of 2018 SyGuS competition [4|—a variant of LOOPINVGEN specifi- 
cally tuned for the competition—including a 5x mean speedup and solving two 
SyGuS problems that no tool in the competition could solve. 


Contributions. In summary, we present the following contributions: 


(Section 2) We empirically observe that, in many cases, increasing grammar 
expressiveness degrades performance of existing SyGuS tools due to over- 
fitting. 

(Section 3) We formally define overfitting and prove no-free-lunch theorems for 
the SyGuS setting, which indicate that overfitting with increasing grammar 
expressiveness is a fundamental characteristic of SyGuS. 

(Section 4) We propose two mitigation strategies — (1) a black-box technique 
that runs multiple parallel instances of a synthesizer, each with a different 
grammar, and (2) asingle-threaded enumerative technique, called hybrid enu- 
meration, that interleaves exploration of multiple grammars. 

(Section 5) We show that incorporating these mitigating measures in existing 
tools significantly improves their performance. 


1 Our implementation is available at https://github.com/SaswatPadhi/LoopInvGen. 
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2 Motivation 


In this section, we first present empirical evidence that existing SyGuS tools are 
sensitive to changes in grammar expressiveness. Specifically, we demonstrate that 
as we increase the expressiveness of the provided grammar, every tool starts fail- 
ing on some benchmarks that it was able to solve with less-expressive grammars. 
We then investigate one of these tools in detail. 


2.1 Grammar Sensitivity of SyGuS Tools 
We evaluated 5 state-of-the-art SyGuS tools that use very different techniques: 


— SKETCHAC [20] extends the SKETCH synthesis system [37] by combining both 
explicit and symbolic search techniques. 

— STOCH |3, IIF] performs a stochastic search for solutions. 

— EUSOLVER [5] combines enumeration with unification strategies. 

— Reynolds et al. [33] extend CVC4 |7] with a refutation-based approach. 

— LoopInvGEN [29] combines enumeration and Boolean function learning. 


We ran these five tools on 180 (b) — true | false | (Bool variables) 
invariant-synthesis benchmarks, which | (not b) | (or b b) | (and b b) 
we describe in Sect.5. We ran the (i) H (Int constants) | (Int variables) 
benchmarks with each of the six > Additional rule in Equalities grammar: 
grammars of quantifier-free predi- (b) E Eii) 


cates, which are shown in Fig. 1. > Additional rules in Intervals grammar: 
These grammars correspond to widely (b) È e iay [=a i) 
used abstract domains in the analy- | E t, eee) 
sis of integer-manipulating programs— 
Equalities, Intervals [11], Octagons [25], 


> Additional rules in Octagons grammar: 
DECID] Cin 


> Additional rule in Polyhedra grammar : 


Polyhedra [12], algebraic expressions (i) E Gs i i) 
(Polynomials) and arbitrary integer DE ; 

N : > Additional rule in Polynomials grammar: 
arithmetic (Peano) [30]. The *, opera- DE ii 


tor denotes scalar multiplication, e.g., 
(*, 2 x), and *, denotes nonlinear mul- 
tiplication, e.g., (#, x y). 

In Fig.2, we report our findings Fig. 1. Grammars of quantifier-free predi- 
on running each benchmark on each cates over integers (We use the — operator 
tool with each grammar, with a 30- to append new rules to previously defined 
minute wall-clock timeout. For each nonterminals.) 

(tool, grammar) pair, the y-axis shows 

the number of failing benchmarks that the same tool is able to solve with a less- 
expressive grammar. We observe that, for each tool, the number of such failures 
increases with the grammar expressiveness. For instance, introducing the scalar 
multiplication operator (+*,) causes CVC4 to fail on 21 benchmarks that it is 
able to solve with Equalities (4/21), Intervals (18/21), or Octagons (10/21). Similarly, 
adding nonlinear multiplication causes LOOPINVGEN to fail on 10 benchmarks 
that it can solve with a less-expressive grammar. 


> Additional rule in Peano grammar: 
(i) F (div i i) | (mod ż 4) 
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Failures due to 
High Expressiveness 
w 
N 


24 21 
16 11 = 
8 2 4 Ge 4 n f E 
o =m = mm | Y= imGae HH 
Intervals Octagons Polyhedra Peano 
= LoopinvGen | SketchAC EUSolver = CVC4 = Stoch 


Fig. 2. For each grammar, each tool, the ordinate shows the number of benchmarks 
that fail with the grammar but are solvable with a less-expressive grammar. 


Increase (f) Unchanged (=) Decrease (|) 


Expressiveness + A Time > Rounds? 27% 67% 6% 
Expressiveness | A Rounds? > Time? 79% 6% 15% 


Fig. 3. Observed correlation between synthesis time and number of rounds, upon 
increasing grammar expressiveness, with LOOPINVGEN [29] on 180 benchmarks 


2.2 Evidence for Overfitting 


To better understand this phenomenon, we instrumented LOOPINVGEN [29] to 
record the candidate expressions that it synthesizes and the number of CEGIS 
iterations (called rounds henceforth). We compare each pair of successful runs 
of each of our 180 benchmarks on distinct grammars.” In 65% of such pairs, we 
observe performance degradation with the more expressive grammar. We also 
report the correlation between performance degradation and number of rounds 
for the more expressive grammar in each pair in Fig. 3. 

In 67% of the cases with degraded performance upon increased grammar 
expressiveness, the number of rounds remains unaffected—indicating that this 
slowdown is mainly due to a larger search space. However, there is significant evi- 
dence of performance degradation due to overfitting as well. We note an increase 
in the number of rounds for 27% of the cases with degraded performance. More- 
over, we notice performance degradation in 79% of all cases that required more 
rounds on increasing grammar expressiveness. 

Thus, a more expressive grammar not only increases the search space, but also 
makes it more likely for LOOPINVGEN to overfit—select a spurious expression, 
which the oracle rejects with a counterexample, hence requiring more rounds. In 
the remainder of this section, we demonstrate this overfitting phenomenon on 
the verification problem shown in Fig. 4, an example by Gulwani and Jojic [17], 
which is the fib_19 benchmark in the Inv track of SyGuS-Comp 2018 [4]. 


? We ignore failing runs since they require an unknown number of rounds. 
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For Fig.4, we require an inductive assume (0<n A 0<m<n) 
invariant that is strong enough to prove assume (x = 0 A y =m) 
that the assertion on line 6 always holds. while (z < r do 
In the SyGuS setting, we need to synthe- heat henge i 
size a predicate Z: Z4 — B defined on a 
symbolic state o = (m,n, x,y), that satis- 


assert (y = n) 


fies Vo: p(T, ø) for the specification y:° Fig. 4. The fib_19 benchmark [17] 
p(Z,c) = (O0<nA O0<m<nAxr=O0A y =m) T(o) (precondition) 

A Yo’: (T(o) A T (ø, a’)) => T(o') (inductiveness) 

A (z =n Z(o)) => y=Hn (postcondition) 


where a’ = (m’,n’, 2’, y’) denotes the new state after one iteration, and T is a 
transition relation that describes the loop body: 


T(o,0') ¥ (a <n) A (a! =24+1) A (m =m) A (w =n) 
A [@ SmAy' =y) Ve >may' =y+1)] 


Increasing expressiveness —> 


Equalities Intervals Octagons Polyhedra Polynomials Peano 
x 0.32s 2.49s 2.48s 55.38 68.0s 
FAIL (19 rounds) (57 rounds) (57 rounds) (76 rounds) (88 rounds) 


(a) Synthesis time and number of CEGIS iterations (rounds) with various grammars 


16: (cx >n)V(x4t+1<n)V(m>zAm=y) 16: (x >n)V(x@t1l<n)v 
(2y =n) V (y (m - 1) = m) 
28: (cx=y)V(ytm—n=2)V(x+2<n) 28: (y = 1) V (y = 0) V (m < 1) V (2?y > 1) 
57: (1+1 >n)vV(z+2<n)V 
57: (m=y)V (1> mA^zT> y) ((m — n)(a — y) = 1) 
(b) Sample predicates with Polyhedra (c) Sample predicates with Peano 


Solution in both grammars: (n > y) A (y > x) A ((m=y) V (1> mA^rz>y)) 


Fig. 5. Performance of LooPINvGeN [29] on the fib_19 benchmark (Fig. 4). In (b) 
and (c), we show predicates generated at various rounds (numbered in bold). 


In Fig. 5(a), we report the performance of LOOPINVGEN on fib_19 (Fig. 4) 
with our six grammars (Fig.1). It succeeds with all but the least-expressive 
grammar. However, as grammar expressiveness increases, the number of rounds 
increase significantly—from 19 rounds with Intervals to 88 rounds with Peano. 

LOOPINVGEN converges to the exact same invariant with both Polyhedra and 
Peano but requires 30 more rounds in the latter case. In Figs.5(b) and (c), we 
list some expressions synthesized with Polyhedra and Peano respectively. These 
expressions are solutions to intermediate subproblems—the final loop invariant 
is a conjunction of a subset of these expressions [29, §3.2]. Observe that the 
expressions generated with the Peano grammar are quite complex and unlikely 
to generalize well. Peano’s extra expressiveness leads to more spurious candidates, 
increasing the chances of overfitting and making the benchmark harder to solve. 


3 We use B, N, and Z to denote the sets of all Boolean values, all natural numbers 
(positive integers), and all integers respectively. 
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3 SyGuS Overfitting in Theory 


In this section, first we formalize the countererample-guided inductive synthesis 
(CEGIS) approach [37] to SyGuS, in which examples are iteratively provided 
by a verification oracle. We then state and prove no-free-lunch theorems, which 
show that there can be no optimal learner for this learning scheme. Finally, we 
formalize a natural notion of overfitting for SyGuS and prove that the potential 
for overfitting increases with grammar expressiveness. 


3.1 Preliminaries 
We borrow the formal definition of a SyGuS problem from prior work [3]: 


Definition 1 (SyGuS Problem). Given a background theory T, a function 
symbol f: X — Y, and constraints on f: (1) a semantic constraint, also called 
a specification, ọ(f,x) over the vocabulary of T along with f and a symbolic 
input x, and (2) a syntactic constraint, also called a grammar, given by a (pos- 
sibly infinite) set E of expressions over the vocabulary of the theory T; find an 
expression e E€ E such that the formula Yx E€ X: (e, x) is valid modulo T. 

We denote this SyGuS problem as (fx_y |,E), and say that it is satisfiable 
iff there exists such an expression e, i.e., Je E E: Va E X: d(e,x). We calle a 
satisfying expression for this problem, denoted as e — (fx_y|¢,€).- 


Recall, we focus on a common class of SyGuS learners, namely those that 
learn from examples. First we define the notion of input-output (IO) examples 
that are consistent with a SyGuS specification: 


Definition 2 (Input-Output Example). Given a specification ¢ defined on 
f: X —Y over a background theory T, we call a pair (x,y) E X xY an input- 
output (IO) example for ¢, denoted as (x,y) x ¢ iff it is satisfied by some valid 
interpretation of f within T, i.e., 


def — 


(£y) Ree = Je, ET: g(a) =y A (We E X: d(e,,2)) 


The next two definitions respectively formalize the two key components of a 
CEGIS-based SyGuS tool: the verification oracle and the learner. 


Definition 3 (Verification Oracle). Given a specification ọ defined on a 
function f: X — Y over theory T, a verification oracle Og is a partial func- 
tion that given an expression e, either returns L indicating Vx E€ X: d(e, x) 
holds, or gives a counterexample (x,y) against e, denoted as e ~x, (x,y), such 
that 


e ~x, (x,y) = ad(e,2) A elz) Ay A (2, y) Re 


We omit ġ from the notations Og and ~x, when it is clear from the context. 
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Definition 4 (CEGIS-based Learner). A CEGIS-based learner £L°(q,€) is 
a partial function that given an integer q > 0, a set E of expressions, and access 
to an oracle O for a specification @ defined on f: X — Y, queries O at most q 
times and either fails with L or generates an expression e E€ E. The trace 


[eo ~x (£0, Yo), «++, Cp—1 ~X (Lp—1, Yp-1)s Ep] where 0< p<q 


summarizes the interaction between the oracle and the learner. Each e; denotes 
the iè candidate for f and (zi, yi) is a counterexample e;, i.e., 


(vj < i: elt) = y; ^ b(e;,2;)) A (ei Xy (£i yi) ) 
Note that we have defined oracles and learners as (partial) functions, and 
hence as deterministic. In practice, many SyGuS tools are deterministic and this 


assumption simplifies the subsequent theorems. However, we expect that these 
theorems can be appropriately generalized to randomized oracles and learners. 


3.2 Learnability and No Free Lunch 


In the machine learning (ML) community, the limits of learning have been for- 
malized for various settings as no-free-lunch theorems [34, §5.1]. Here, we provide 
a natural form of such theorems for CEGIS-based SyGuS learning. 

In SyGuS, the learned function must conform to the given grammar, which 
may not be fully expressive. Therefore we first formalize grammar expressiveness: 


Definition 5 (k-Expressiveness). Given a domain X and range Y, a gram- 
mar E is said to be k-expressive iff E can express exactly k distinct X — Y 
functions. 


A key difference from the ML setting is our notion of m-learnability, which 
formalizes the number of examples that a learner requires in order to learn a 
desired function. In the ML setting, a function is considered to m-learnable by a 
learner if it can be learned using an arbitrary set of m i.i.d. examples (drawn from 
some distribution). This makes sense in the ML setting since the learned function 
is allowed to make errors (up to some given bound on the error probability), but 
it is much too strong for the all-or-nothing SyGuS setting. 

Instead, we define a much weaker notion of m-learnability for CEGIS-based 
SyGuS, which only requires that there exist a set of m examples that allows the 
learner to succeed. The following definition formalizes this notion. 


Definition 6 (CEGIS-based m-Learnability). Given a SyGuS problem S = 
(fxv|@,€), and an integer m > 0, we say that S is m-learnable by a CEGIS- 
based learner L iff there exists a verification oracle O under which L can learn a 
satisfying expression for S with at most m queries to O, i.e., 30: L°(m,E) E S. 


Finally we state and prove the no-free-lunch (NFL) theorems, which make 
explicit the tradeoff between grammar expressiveness and learnability. Intu- 
itively, given an integer m and an expressive enough (as a function of m) gram- 
mar, for every learner there exists a SyGuS problem that cannot be solved with- 
out access to at least m + 1 examples. This is true despite our weak notion of 
learnability. 
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Put another way, as grammar expressiveness increases, so does the number 
of examples required for learning. On one extreme, if the given grammar is 
l-expressive, i.e., can express exactly one function, then all satisfiable SyGuS 
problems are 0-learnable—no examples are needed because there is only one 
function to learn—but there are many SyGuS problems that cannot be satisfied 
by this function. On the other extreme, if the grammar is |Y||*!-expressive, i.e., 
can express all functions from X to Y, then for every learner there exists a 
SyGuS problem that requires all |X| examples in order to be solved. 

Below we first present the NFL theorem for the case when the domain X 
and range Y are finite. We then generalize to the case when these sets may 
be countably infinite. We provide the proofs of these theorems in the extended 
version of this paper [27, Appendix A.1]. 


Theorem 1 (NFL in CEGIS-based SyGuS on Finite Sets). Let X and Y 

be two arbitrary finite sets, T be a theory that supports equality, E be a grammar 

over T, and m be an integer such that 0 < m < |X|. Then, either: 

— E is not k-expressive for any k > ar ae or 

— for every CEGIS-based learner L, there exists a satisfiable SyGuS problem 
S=(fx_»|,€), such that S is not m-learnable by L. Moreover, there exists 
a different CEGIS-based learner for which S is m-learnable. 


Theorem 2 (NFL in CEGIS-based SyGuS on Countably Infinite Sets). 
Let X be an arbitrary countably infinite set, Y be an arbitrary finite or countably 
infinite set, T be a theory that supports equality, E be a grammar over T, and m 
be an integer such that m > 0. Then, either: 


— E is not k-expressive for any k > Xo, where Xo z IN|, or 

— for every CEGIS-based learner L, there exists a satisfiable SyGuS problem 
S= (fx_y |,E), such that S is not m-learnable by L. Moreover, there exists 
a different CEGIS-based learner for which S is m-learnable. 


3.3 Overfitting 


Last, we relate the above theory to the notion of overfitting from ML. In the 
context of SyGuS, overfitting can potentially occur whenever there are multiple 
candidate expressions that are consistent with a given set of examples. Some of 
these expressions may not generalize to satisfy the specification, but the learner 
has no way to distinguish among them (using just the given set of examples) and 
so can “guess” incorrectly. We formalize this idea through the following measure: 


Definition 7 (Potential for Overfitting). Given a problem S = 
(fxv |,E), and a set Z of IO examples for ¢, we define the potential for 
overfitting Q as the number of expressions in E that are consistent with Z but 
do not satisfy S, i.e., 


as, z) = Heese | eE SA Va) €Z: e(x) = y}| YVzEZ:z Rro 
> L (undefined) otherwise 
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Intuitively, a zero potential for overfitting means that overfitting is not pos- 
sible on the given problem with respect to the given set of examples, because 
there is no spurious candidate. A positive potential for overfitting means that 
overfitting is possible, and higher values imply more spurious candidates and 
hence more potential for a learner to choose the “wrong” expression. 

The following theorems connect our notion of overfitting to the earlier NFL 
theorems by showing that overfitting is inevitable with an expressive enough 
grammar. The proofs of these theorems can be found in the extended version of 
this paper [27, Appendix A.2]. 


Theorem 3 (Overfitting in SyGuS on Finite Sets). Let X and Y be two 
arbitrary finite sets, m be an integer such that 0 < m < |X|, T be a theory 
that supports equality, and E be a k-expressive grammar over T for some k > 


Soa Then, there exists a satisfiable SyGuS problem S = (fy_,|¢,€) 5 


such that Q(S, Z) > 0, for every set Z of m IO examples for ¢. 


Theorem 4 (Overfitting in SyGuS on Countably Infinite Sets). Let X 
be an arbitrary countably infinite set, Y be an arbitrary finite or countably infinite 
set, T be a theory that supports equality, and E be a k-expressive grammar over T 
for somek > No. Then, there exists a satisfiable SyGuS problem S = (fx_y|,€) » 
such that Q(S, Z) > 0, for every set Z of m IO examples for ¢. 


Finally, it is straightforward to show that as the expressiveness of the gram- 
mar provided in a SyGuS problem increases, so does its potential for overfitting. 


Theorem 5 (Overfitting Increases with Expressiveness). Let X and Y 
be two arbitrary sets, T be an arbitrary theory, Ey and Eg be grammars over T 
such that Eı C E2, Q be an arbitrary specification over T and a function symbol 
f: xX —Y, and Z be a set of IO examples for ¢. Then, we have 


AU Fe |Get, Z) < AU ayl bE Z) 


4 Mitigating Overfitting 


Ensemble methods [13] in machine learning (ML) are a standard approach to 
reduce overfitting. These methods aggregate predictions from several learners to 
make a more accurate prediction. In this section we propose two approaches, 
inspired by ensemble methods in ML, for mitigating overfitting in SyGuS. Both 
are based on the key insight from Sect. 3.3 that synthesis over a subgrammar has 
a smaller potential for overfitting as compared to that over the original grammar. 


4.1 Parallel SyGuS on Multiple Grammars 


Our first idea is to run multiple parallel instances of a synthesizer on the same 
SyGuS problem but with grammars of varying expressiveness. This framework, 
called PLEARN, is outlined in Algorithm 1. It accepts a synthesis tool 7, a SyGuS 
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Algorithm 1. The PLEARN framework for SyGuS tools. 
func PLEARN (T: Synthesis Tool, (fx y |,E), : Problem, E1...p: Subgrammars) 
> Requires: VE; € E1...p: EX CE 
parallel for i —1,...,p do 
Si — (fxv |$, Ei) T 
ei — T(S:) 
if e; 4 L then return e; 
return L 


problem (f,_.) | ,€),,, and subgrammars £1...p; such that £; C E. The parallel 
for construct creates a new thread for each iteration. The loop in PLEARN 
creates p copies of the SyGuS problem, each with a different grammar from €._.p, 
and dispatches each copy to a new instance of the tool 7. PLEARN returns the 
first solution found or L if none of the synthesizer instances succeed. 

Since each grammar in €;_, is subsumed by the original grammar E, any 
expression found by PLEARN is a solution to the original SyGuS problem. More- 
over, from Theorem 5 it is immediate that PLEARN indeed reduces overfitting. 


Theorem 6 (PLEARN Reduces Overfitting). Given a SyGuS' problem S = 
(fx.yv|,€)_, if PLEARN is instantiated with S and subgrammars €1__) such that 
VE, € Ey. yp: Ei CE, then for each Si = (fxy |$, Ei), constructed by PLEARN, 
we have that Q(S;,Z) < Q(S, Z) on any set Z of IO examples for ġ. 


A key advantage of PLEARN is that it is agnostic to the synthesizer’s imple- 
mentation. Therefore, existing SyGuS learners can immediately benefit from 
PLEARN, as we demonstrate in Sect.5.1. However, running p parallel SyGuS 
instances can be prohibitively expensive, both computationally and memory- 
wise. The problem is worsened by the fact that many existing SyGuS tools 
already use multiple threads, e.g., the SKETCHAC [20] tool spawns 9 threads. 
This motivates our hybrid enumeration technique described next, which is a 
novel synthesis algorithm that interleaves exploration of multiple grammars in 
a single thread. 


4.2 Hybrid Enumeration 


Hybrid enumeration extends the enumerative synthesis technique, which enu- 
merates expressions within a given grammar in order of size and returns the 
first candidate that satisfies the given examples [2]. Our goal is to simulate 
the behavior of PLEARN with an enumerative synthesizer in a single thread. 
However, a straightforward interleaving of multiple PLEARN threads would be 
highly inefficient because of redundancies — enumerating the same expression 
(which is contained in multiple grammars) multiple times. Instead, we propose 
a technique that (1) enumerates each expression at most once, and (2) reuses 
previously enumerated expressions to construct larger expressions. 


4 We use the shorthand X1,....n to denote the sequence (Kiers Maye 


gene 


326 S. Padhi et al. 


To achieve this, we extend a widely used [2, 15,31] synthesis strategy, called 
component-based synthesis [21], wherein the grammar of expressions is induced 
by a set of components, each of which is a typed operator with a fixed arity. 
For example, the grammars shown in Fig. 1 are induced by integer components 
(such as 1, +, mod, =, etc.) and Boolean components (such as true, and, or, etc.). 
Below, we first formalize the grammar that is implicit in this synthesis style. 


Definition 8 (Component-Based Grammar). Given a set € of typed com- 
ponents, we define the component-based grammar E as the set of all expressions 
formed by well-typed component application over @, i.e., 


E = {c(e1,...,ea) | (CETL X X TA >T) EC ANE ACE 
NEET A+++ A €a: Tat 


where e : T denotes that the expression e has type T. 


We denote the set of all components appearing in a component-based gram- 
mar € as components(€). Henceforth, we assume that components(€) is known 
(explicitly provided by the user) for each €. We also use values(E€) to denote the 
subset of nullary components (variables and constants) in components(€), and 
operators(E) to denote the remaining components with positive arities. 

The closure property of component-based grammars significantly reduces the 
overhead of tracking which subexpressions can be combined together to form 
larger expressions. Given a SyGuS problem over a grammar €, hybrid enumer- 
ation requires a sequence €,_, of grammars such that each E; is a component- 
based grammar and that €; C --- C Ep © E. Next, we explain how the subset 
relationship between the grammars enables efficient enumeration of expressions. 

Given grammars E1 C --- C Ep, observe that an expression of size k in E; 
may only contain subexpressions of size {1,...,(k —1)} belonging to €,,;. This 
allows us to enumerate expressions in an order such that each subexpression e is 
synthesized (and cached) before any expressions that have e as a subexpression. 
We call an enumeration order that ensures this property a well order. 


Definition 9 (Well Order). Given arbitrary grammars E1..p, we say that a 
strict partial order < on Ej...» X N is a well order iff 


V Ea, Ey E E1..p : Vki,ko EN: [Ea C&A ky < kg] => (Ea, k1) < (E, k2) 


Motivated by Theorem 5, our implementation of hybrid enumeration uses a 
particular well order that incrementally increases the expressiveness of the space 
of expressions. For a rough measure of the expressiveness (Definition 5) of a pair 
(E, k), i.e., the set of expressions of size k in a given grammar E, we simply 
overapproximate the number of syntactically distinct expressions: 


Theorem 7. Let E.p be component-based grammars and G; = components(€;). 
Then, the following strict partial order <x on E1...p X N is a well order 


V Eq, E€ Erp: VM, n EN: (Eam) dx (Enn) = |l” <|G|” 
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We now describe the main hybrid enumeration algorithm, which is listed in 
Algorithm 2. The HENuM function accepts a SyGuS problem (fx y | ¢,€),, a set 
E1...p of component-based grammars such that E1 C --- C Ep G E, a well order 
<, and an upper bound q > 0 on the size of expressions to enumerate. In lines 
4-8, we first enumerate all values and cache them as expressions of size one. In 
general C[j, k][7] contains expressions of type 7 and size k from €; \ €;—1. In line 
9 we sort (grammar, size) pairs in some total order consistent with <. Finally, in 
lines 10-20, we iterate over each pair (€;,k) and each operator from €),_; and 
invoke the DIVIDE procedure (Algorithm 3) to carefully choose the operator’s 
argument subexpressions ensuring (1) correctness — their sizes sum up to k—1, 
(2) efficiency — expressions are enumerated at most once, and (3) completeness 
— all expressions of size k in E; are enumerated. 

The DIVIDE algorithm generates a set of locations for selecting arguments 
to an operator. Each location is a pair (x,y) indicating that any expression 
from Cx, y][7] can be an argument, where 7 is the argument type required by 
the operator. DIVIDE accepts an arity a for an operator o, a size budget q, the 
index I of the least-expressive grammar containing o, the index j of the least- 
expressive grammar that should contain the constructed expressions of the form 
o(e1,---,@q), and an accumulator a that stores the list of argument locations. 
In lines 7-9, the size budget is recursively divided among a — 1 locations. In 
each recursive step, the upper bound (q— a+ 1) on v ensures that we have a size 
budget of at least q — (q — a + 1) = a — 1 for the remaining a — 1 locations. This 
results in a call tree such that the accumulator a at each leaf node contains the 
locations from which to select the last a— 1 arguments, and we are left with some 
size budget q > 1 for the first argument e1. Finally in lines 4-5, we carefully 
select the locations for e; to ensure that o(e1,...,€a) has not been synthesized 
before—either o € components(€;) or at least one argument belongs to €;\E;-1.° 

We conclude by stating some desirable properties satisfied by HENUM. Their 
proofs are provided in the extended version of this paper [27, Appendix A.3]. 


Theorem 8 (HENUM is Complete up to Size q). Given a SyGuS problem 
S = (fx_y|@,€)_, let E1..p be component-based grammars over theory T such 
that Ey C++» C Ep = E, < be a well order on Ey...» X N, and q > 0 be an upper 
bound on size of expressions. Then, HENUM(S, €1...p,<,q) will eventually find a 
satisfying expression if there exists one with size < q. 


Theorem 9 (HENUM is Efficient). Given a SyGuS problem S = 
(fxv |,E) let Er...» be component-based grammars over theory T such that 
E1 C- CE, CE, < be a well order on €1..) X N, and q > 0 be an upper bound 
on size of expressions. Then, HENUM(S, €}...»,<,q) will enumerate each distinct 
expression at most once. 


5 We use o as the cons operator for sequences, e.g., x © ly, z} = (a, y, 2). 
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Algorithm 2. Hybrid enumeration to combat overfitting in SyGuS 
func HENUM ((fx—y |,E), : Problem, E1...p: Grammars, <: WO, q: Max. Size) 
> Requires: component-based grammars £; C --- C Ep C E and v as the input variable 
Ca 
for i — 1 to p do 
V + if i = 1 then values(£1) else [values(€;) \ values(E;_1) ] 
for each (e : T) € V do 
Cli, 1][7] — Cfi, 1][7] U {e} 
if Yz E€ X: d(Av.e, x) then return Av. e 
R — SORT(<, E1...p X {2,..-,q}) 
for i — 1 to | R| do 
(Ej, k) — Rij] 
for l — 1 to j do 
O <— if | = 1 then operators(€,) else [operators(E,) \ operators(€;_1) | 
for each (0:71 X- X Ta > T) E O do 
L — Divipe(a, k—1, l, j, ()) 
for each ((a1, 41), wieg (Ley Ya)) E€ L do 
for each e; a E€ Cli, yi][71] x -+ x Cla, yal[Ta] do 
e — o(e1,..., €a) 
Cli, klir] — Cli, kli] u {e} 
if Yx E€ X: ọġ(àv.e,x) then return Av. e 


return | 


Algorithm 3. An algorithm to divide a given size budget among subexpres- 
sions ° 
func DIvIDE (a: Arity, q: Size, l: Op. Level, j: Expr. Level, a: Accumulated Args.) 
> Requires: 1<a<qAI<j 
if a = 1 then 
if l=j V 3(x,y) Ea: x= j then return {(1,4)¢a,..., (jq) oa} 
return {(j,q) oa} 
L={} 
for u — 1 to j do 
for v — 1 to (q— a + 1) do 
L L U Divive(a—1, q— v, l, j, (u,v) oa) 
return L 


5 Experimental Evaluation 


In this section we empirically evaluate PLEARN and HENUM. Our evaluation 
uses a set of 180 synthesis benchmarks,’ consisting of all 127 official benchmarks 
from the Inv track of 2018 SyGuS competition [4] augmented with benchmarks 
from the 2018 Software Verification competition (SV-Comp) [8] and challenging 
verification problems proposed in prior work [9,10]. All these synthesis tasks are 


€ All benchmarks are available at https://github.com/SaswatPadhi/LoopInvGen. 
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defined over integer and Boolean values, and we evaluate them with the six gram- 
mars described in Fig. 1. We have omitted benchmarks from other tracks of the 
SyGuS competition as they either require us to construct E1...p (Sect. 4) by hand 
or lack verification oracles. All our experiments use an 8-core Intel ® Xeon ® E5 
machine clocked at 2.30 GHz with 32GB memory running Ubuntu ® 18.04. 


5.1 Robustness of PLEARN 


For five state-of-the-art SyGuS solvers — (a) LOOPINVGEN [29], (b) CVC4 
[7,33], (c) STOCH [3, HIF], (d) SKETCHAC [8,20], and (e) EUSOLVER [5] — we 
have compared the performance across various grammars, with and without the 
PLEARN framework (Algorithm 1). In this framework, to solve a SyGuS problem 
with the pt! expressiveness level from our six integer-arithmetic grammars (see 
Fig. 1), we run p independent parallel instances of a SyGuS tool, each with one of 
the first p grammars. For example, to solve a SyGuS problem with the Polyhedra 
grammar, we run four instances of a solver with the Equalities, Intervals, Octagons 
and Polyhedra grammars. We evaluate these runs for each tool, for each of the 
180 benchmarks and for each of the six expressiveness levels. 


ve Solid blue curves () show 


120 147 a original failure counts. 
= a eae Dashed orange curves (e) show 
7 80 68 67 65 65 failure counts with PLEARN. 
o . . 
& E E E Es Es Timeout = 30 min. 
(a) LoopInvGeEN [29] (b) CVC4 [7, 33] (wall-clock) 
180 151 158 442 


& E E E, Es Es E E E: Es Es Es 


(c) STOCH [3, IIIF] (d) SKETCHAC [20, 37] (e) EUSOLVER [5] 


Fig. 6. The number of failures on increasing grammar expressiveness, for state-of-the- 
art SyGuS tools, with and without the PLEARN framework (Algorithm 1) 


Figure 6 summarizes our findings. Without PLEARN the number of failures 
initially decreases and then increases across all solvers, as grammar expressive- 
ness increases. However, with PLEARN the tools incur fewer failures at a given 
level of expressiveness, and there is a trend of decreased failures with increased 
expressiveness. Thus, we have demonstrated that PLEARN is an effective mea- 
sure to mitigate overfitting in SyGuS tools and significantly improve their 
performance. 
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5.2 Performance of Hybrid Enumeration 


To evaluate the performance of hybrid enumeration, we augment an existing syn- 
thesis engine with HENUM (Algorithm 2). We modify our LOOPINVGEN tool [29], 
which is the best-performing SyGuS synthesizer from Fig.6. Internally, LOOP- 
INVGEN leverages ESCHER [2], an enumerative synthesizer, which we replace 
with HENUM. We make no other changes to LOOPINVGEN. We evaluate the 
performance and resource usage of this solver, LOOPINVGEN+HE, relative to 
the original LOOPINVGEN with and without PLEARN (Algorithm 1). 


Performance. In Fig. 7(a), we show the number of failures across our six gram- 
mars for LOOPINVGEN, LOOPINVGEN-+HE and LOOPINVGEN with PLEARN, 
over our 180 benchmarks. LOOPINVGEN-+HE has a significantly lower failure 
rate than LOOPINVGEN, and the number of failures decreases with grammar 
expressiveness. Thus, hybrid enumeration is a good proxy for PLEARN. 


80 


Grammar M [tf] M [zr 


Equalities 1.00 1.00 

Intervals 1.91 1.04 
Octagons 2.84 1.03 
Polyhedra 3.72 1.01 


+ 
30 : š N 
Intervals Octagons Polyhedra Polynomials Peano Polynomials 4.62 1.00 
Peano 5.49 0.97 


-= LooplnvGen --@-LoopInvGen+HE -@- PLearn(LooplnvGen) 


60 


50 


40 


Number of Failures 


(a) Failures on increasing grammar expressiveness (b) Median(M) overhead 


Fig. 7. L= LoopInvGen, H= LoopInvGen-+HE, P-=PLearn (LOoOPINVGEN). H is 
not only significantly robust against increasing grammar expressiveness, but it also has 
a smaller total-time cost (t) than P and a negligible overhead over L. 


Resource Usage. To estimate how computationally expensive each solver is, we 
compare their total-time cost (t). Since LOOPINVGEN and LooPpINVGEN+HE 
are single-threaded, for them we simply use the wall-clock time for synthesis as 
the total-time cost. However, for PLEARN with p parallel instances of LOOPIN- 
VGEN, we consider the total-time cost as p times the wall-clock time for synthesis. 
In Fig. 7(b), we show the median overhead (ratio of t) incurred by PLEARN 
over LOOPINVGEN-+HE and LOOPINVGEN+HE over LOOPINVGEN, at various 
expressiveness levels. As we move to grammars of increasing expressiveness, the 
total-time cost of PLEARN increases significantly, while the total-time cost of 
LooPINvGEN-+HE essentially matches that of LOOPINVGEN. 


5.3 Competition Performance 


Finally, we evaluate the performance of LOOPINVGEN-+HE on the benchmarks 
from the Inv track of the 2018 SyGuS competition [4], against the official winning 
solver, which we denote LIG [28]—a version of LOOPINVGEN [29] that has been 
extensively tuned for this track. In the competition, there are some invariant- 
synthesis problems where the postcondition itself is a satisfying expression. 
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LIG starts with the postcondition as the first candidate and is extremely fast on 
such programs. For a fair comparison, we added this heuristic to LOOPINVGEN 
+HE as well. No other change was made to LOOPINVGEN-+HE. 

LOoPINVGEN solves 115 benchmarks in a total of 2191 seconds whereas 
LooPINVGEN-+HE solves 117 benchmarks in 429 seconds, for a mean speedup of 
over 5x. Moreover, no entrants to the competition could solve [4] the two addi- 
tional benchmarks (gcnr_tacas08 and fib_20) that LOOPINVGEN-+HE solves. 


6 Related Work 


The most closely related work to ours investigates overfitting for verification 
tools [36]. Our work differs from theirs in several respects. First, we address 
the problem of overfitting in CEGIS-based synthesis. Second, we formally define 
overfitting and prove that all synthesizers must suffer from it, whereas they only 
observe overfitting empirically. Third, while they use cross-validation to combat 
overfitting in tuning a specific hyperparameter of a verifier, our approach is to 
search for solutions at different expressiveness levels. 

The general problem of efficiently searching a large space of programs for 
synthesis has been explored in prior work. Lee et al. [24] use a probabilistic model, 
learned from known solutions to synthesis problems, to enumerate programs in 
order of their likelihood. Other approaches employ type-based pruning of large 
search spaces [26,32]. These techniques are orthogonal to, and may be combined 
with, our approach of exploring grammar subsets. 

Our results are widely applicable to existing SyGuS tools, but some tools 
fall outside our purview. For instance, in programming-by-example (PBE) sys- 
tems [18, §7], the specification consists of a set of input-output examples. Since 
any program that meets the given examples is a valid satisfying expression, our 
notion of overfitting does not apply to such tools. However in a recent work, Inala 
and Singh [19] show that incrementally increasing expressiveness can also aid 
PBE systems. They report that searching within increasingly expressive gram- 
mar subsets requires significantly fewer examples to find expressions that gener- 
alize better over unseen data. Other instances where the synthesizers can have a 
free lunch, i.e., always generate a solution with a small number of counterexam- 
ples, include systems that use grammars with limited expressiveness [16, 21,35]. 

Our paper falls in the category of formal results about SyGuS. In one such 
result, Jha and Seshia [22] analyze the effects of different kinds of counterexam- 
ples and of providing bounded versus unbounded memory to learners. Notably, 
they do not consider variations in “concept classes” or “program templates,” 
which are precisely the focus of our study. Therefore, our results are comple- 
mentary: we treat counterexamples and learners as opaque and instead focus on 
grammars. 


7 Conclusion 


Program synthesis is a vibrant research area; new and better synthesizers are 
being built each year. This paper investigates a general issue that affects all 
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CEGIS-based SyGuS tools. We recognize the problem of overfitting, formalize it, 
and identify the conditions under which it must occur. Furthermore, we provide 
mitigating measures for overfitting that significantly improve the existing tools. 


Acknowledgement. We thank Guy Van den Broeck and the anonymous reviewers for 
helpful feedback for improving this work, and the organizers of the SyGuS competition 
for making the tools and benchmarks publicly available. 

This work was supported in part by the National Science Foundation (NSF) under 
grants CCF-1527923 and CCF-1837129. The lead author was also supported by an 
internship and a PhD Fellowship from Microsoft Research. 


References 


1. The SyGuS Competition (2019). http://sygus.org/comp/. Accessed 10 May 2019 
2. Albarghouthi, A., Gulwani, S., Kincaid, Z.: Recursive program synthesis. In: Shary- 
gina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 934-950. Springer, Hei- 
delberg (2013). https: //doi-org/10.1007/978-3-642-39799-8 67 
3. Alur, R., et al.: Syntax-guided synthesis. In: Formal Methods in Computer-Aided 
Design, FMCAD, pp. 1-8. IEEE (2013). http://ieeexplore.ieee.org/document / 
6679385 / 
4. Alur, R., Fisman, D., Padhi, S., Singh, R., Udupa, A.: SyGuS-Comp 2018: Results 
and Analysis. CoRR abs/1904.07146 (2019). http://arxiv.org/abs/1904.07146 
5. Alur, R., Radhakrishna, A., Udupa, A.: Scaling enumerative program synthesis via 
divide and conquer. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 
10205, pp. 319-336. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3- 
662-54577-5__ 18 
6. Alur, R., Singh, R., Fisman, D., Solar-Lezama, A.: Search-based program synthesis. 
Commun. ACM 61(12), 84-93 (2018). https: //doi.org/10.1145/3208071 
7. Barrett, C., et al.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. 
LNCS, vol. 6806, pp. 171-177. Springer, Heidelberg (2011). https://doi.org/10. 
1007/978-3-642-22110-1 14 
8. Beyer, D.: Software verification with validation of results. In: Legay, A., Margaria, 
T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 331-349. Springer, Heidelberg (2017). 
https: //doi.org/10.1007/978-3-662-54580-5 20 
9. Bounoy, D., DeRossi, A., Menarini, M., Griswold, W.G., Lerner, S.: Inferring loop 
invariants through gamification. In: Proceedings of the 2018 CHI Conference on 
Human Factors in Computing Systems, CHI, p. 231. ACM (2018). https: //doi.org/ 
10.1145 /3173574.3173805 
10. Bradley, A.R., Manna, Z., Sipma, H.B.: The polyranking principle. In: Caires, L., 
Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, 
vol. 3580, pp. 1349-1361. Springer, Heidelberg (2005). https://doi.org/10.1007/ 
11523468 109 
11. Cousot, P., Cousot, R.: Static determination of dynamic properties of generalized 
type unions. In: Language Design for Reliable Software, pp. 77-94 (1977). https:// 
doi.org/ 10.1145 /800022.808314 
12. Cousot, P., Halbwachs, N.: Automatic Discovery of Linear Restraints Among Vari- 
ables of a Program. In: Conference Record of the Fifth Annual ACM Symposium 
on Principles of Programming Languages. pp. 84-96. ACM Press (1978), https: // 
doi.org/10.1145/512760.512770 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


Overfitting in Synthesis: Theory and Practice 333 


Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. 
(eds.) MCS 2000. LNCS, vol. 1857, pp. 1-15. Springer, Heidelberg (2000). https:// 
doi.org/10.1007/3-540-45014-9_1 

Ezudheen, P., Neider, D., D’Souza, D., Garg, P., Madhusudan, P.: Horn-ICE learn- 
ing for synthesizing invariants and contracts. PACMPL 2(OOPSLA), 131:1-131:25 
(2018). https: //doi.org/10.1145/3276501 

Feng, Y., Martins, R., Geffen, J.V., Dillig, I., Chaudhuri, S.: Component-based syn- 
thesis of table consolidation and transformation tasks from examples. In: Proceed- 
ings of the 38th ACM SIGPLAN Conference on Programming Language Design 
and Implementation, PLDI, pp. 422-436. ACM (2017). https://doi.org/10.1145/ 
3062341.3062351 

Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings 
from I/O samples. In: ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI, pp. 441-452. ACM (2012). https: //doi.org/10. 
1145 /2254064.2254116 

Gulwani, S., Jojic, N.: Program verification as probabilistic inference. In: Pro- 
ceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Pro- 
gramming Languages, POPL, pp. 277-289. ACM (2007). https: //doi.org/10.1145/ 
1190216.1190258 

Gulwani, S., Polozov, O., Singh, R.: Program synthesis. Found. Trends Program. 
Lang. 4(1-2), 1-119 (2017). https://doi.org/10.1561/2500000010 

Inala, J.P., Singh, R.: WebRelate: Integrating Web Data with Spreadsheets using 
Examples. PACMPL 2(POPL), 2:1-2:28 (2018). https: //doi.org/10.1145/3158090 
Jeon, J., Qiu, X., Solar-Lezama, A., Foster, J.S.: Adaptive concretization for paral- 
lel program synthesis. In: Kroening, D., Pasareanu, C.S. (eds.) CAV 2015. LNCS, 
vol. 9207, pp. 377-394. Springer, Cham (2015). https://doi-org/10.1007/978-3-319- 
21668-3 22 

Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based pro- 
gram synthesis. In: Proceedings of the 32nd ACM/IEEE International Conference 
on Software Engineering. ICSE, vol. 1, pp. 215-224. ACM (2010). https://doi.org/ 
10.1145/1806799.1806833 

Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. Acta 
Informatica 54(7), 693-726 (2017). https://doi.org/10.1007/s00236-017-0294-5 
Le, X.D., Chu, D., Lo, D., Le Goues, C., Visser, W.: S3: syntax- and semantic- 
guided repair synthesis via programming by examples. In: Proceedings of the 11th 
Joint Meeting on Foundations of Software Engineering. ESEC/FSE, pp. 593-604. 
ACM (2017). https://doi-org/10.1145/3106237.3106309 

Lee, W., Heo, K., Alur, R., Naik, M.: Accelerating search-based program synthesis 
using learned probabilistic models. In: Proceedings of the 39th ACM SIGPLAN 
Conference on Programming Language Design and Implementation, PLDI 2018, 
pp. 436-449. ACM (2018). https: //doi.org/10.1145/3192366.3192410 

Miné, A.: The octagon abstract domain. In: Proceedings of the Eighth Work- 
ing Conference on Reverse Engineering, WCRE, p. 310. IEEE Computer Society 
(2001). https: //doi.org/10.1109/WCRE.2001.957836 

Osera, P., Zdancewic, S.: Type-and-example-directed program synthesis. In: Pro- 
ceedings of the 36th ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI, pp. 619-630. ACM (2015). https://doi.org/ 
10.1145 /2737924.2738007 

Padhi, S., Millstein, T., Nori, A., Sharma, R.: Overfitting in Synthesis: Theory and 
Practice. CoRR abs/1905.07457 (2019). https://arxiv.org/pdf/1905.07457 


334 S. Padhi et al. 


28. Padhi, S., Sharma, R., Millstein, T.: LoopInvGen: A Loop Invariant Generator 
based on Precondition Inference. CoRR abs/1707.02029 (2018). http: //arxiv.org/ 
abs/1707.02029 

29. Padhi, S., Sharma, R., Millstein, T.D.: Data-driven precondition inference with 
learned features. In: Proceedings of the 37th ACM SIGPLAN Conference on Pro- 
gramming Language Design and Implementation, PLDI, pp. 42-56. ACM (2016). 
https: //doi.org/10.1145/2908080.2908099 

30. Peano, G.: Calcolo geometrico secondo l’Ausdehnungslehre di H. Grassmann: pre- 
ceduto dalla operazioni della logica deduttiva, vol. 3. Fratelli Bocca (1888) 

31. Perelman, D., Gulwani, S., Grossman, D., Provost, P.: Test-driven synthesis. In: 
ACM SIGPLAN Conference on Programming Language Design and Implementa- 
tion, PLDI, pp. 408-418. ACM (2014). https://doi.org/10.1145/2594291.2594297 

32. Polikarpova, N., Kuraj, I., Solar-Lezama, A.: Program synthesis from polymor- 
phic refinement types. In: Proceedings of the 37th ACM SIGPLAN Conference 
on Programming Language Design and Implementation, PLDI, pp. 522-538. ACM 
(2016). https://doi.org/10.1145/2908080.2908093 

33. Reynolds, A., Deters, M., Kuncak, V., Tinelli, C., Barrett, C.: Counterexample- 
guided quantifier instantiation for synthesis in SMT. In: Kroening, D., Păsăreanu, 
C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 198-216. Springer, Cham (2015). 
https: //doi.org/10.1007/978-3-319-21668-3 12 

34. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory 
to Algorithms. Cambridge University Press, Cambridge (2014) 

35. Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., Nori, A.V.: A data 
driven approach for algebraic loop invariants. In: Felleisen, M., Gardner, P. (eds.) 
ESOP 2013. LNCS, vol. 7792, pp. 574-592. Springer, Heidelberg (2013). https:// 
doi.org/10.1007/978-3-642-37036-6_ 31 

36. Sharma, R., Nori, A.V., Aiken, A.: Bias-variance tradeoffs in program analysis. 
In: The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Pro- 
gramming Languages, POPL, pp. 127-138. ACM (2014). https: //doi-org/10.1145/ 
2535838.2535853 

37. Solar-Lezama, A.: Program sketching. STTT 15(5-6), 475-495 (2013) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


® 


Check for 
updates 


Proving Unrealizability for Syntax-Guided 
Synthesis 


Qinheping Hu!®), Jason Breck!, John Cyphert!, Loris D’Antoni!, 
and Thomas Reps!” 


1 University of Wisconsin-Madison, Madison, USA 
qghu28@wisc.edu 
2 GrammatTech, Inc., Ithaca, USA 


Abstract. We consider the problem of automatically establishing 
that a given syntax-guided-synthesis (SyGuS) problem is unrealizable 
(i.e., has no solution). Existing techniques have quite limited ability to 
establish unrealizability for general SyGUS instances in which the gram- 
mar describing the search space contains infinitely many programs. By 
encoding the synthesis problem’s grammar G as a nondeterministic pro- 
gram Pg, we reduce the unrealizability problem to a reachability problem 
such that, if a standard program-analysis tool can establish that a certain 
assertion in Pg always holds, then the synthesis problem is unrealizable. 

Our method can be used to augment existing SyGUS tools so that 
they can establish that a successfully synthesized program q is optimal 
with respect to some syntactic cost—e.g., q has the fewest possible if- 
then-else operators. Using known techniques, grammar G can be trans- 
formed to generate the set of all programs with lower costs than q—e.g., 
fewer conditional expressions. Our algorithm can then be applied to show 
that the resulting synthesis problem is unrealizable. We implemented the 
proposed technique in a tool called NOPE. NOPE can prove unrealizability 
for 59/132 variants of existing linear-integer-arithmetic SyGuS bench- 
marks, whereas all existing SyGUuS solvers lack the ability to prove that 
these benchmarks are unrealizable, and time out on them. 


1 Introduction 


The goal of program synthesis is to find a program in some search space that 
meets a specification—e.g., satisfies a set of examples or a logical formula. 
Recently, a large family of synthesis problems has been unified into a frame- 
work called syntax-guided synthesis (SyGUS). A SyGuS problem is specified 
by a regular-tree grammar that describes the search space of programs, and a 
logical formula that constitutes the behavioral specification. Many synthesizers 
now support a specific format for SYGUS problems [1], and compete in annual 
synthesis competitions [2]. Thanks to these competitions, these solvers are now 
quite mature and are finding a wealth of applications [9]. 

Consider the SYGuUS problem to synthesize a function f that computes the 
maximum of two variables x and y, denoted by (Ymax2( f, £, y), G1). The goal is to 
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create €s—an expression-tree for f—where ef is in the language of the following 
regular-tree grammar G1: 


Start := Plus(Start, Start) | If[henElse(BExpr, Start, Start) |2|y|0|1 
BExpr ::= GreaterThan(Start, Start) | Not(BExpr) | And(BExpr, BExpr) 


and Va, y.Umax2([ey],2,y) is valid, where [ef] denotes the meaning of ef, and 
Ynaxe(f,2,y) = f(£,y) > £A f(x,y) > y A (fay) = xV f(x,y) = y). 
SYGUS solvers can easily find a solution, such as 
e := IfThenElse(GreaterThan(z, y), x, y). 


Although many solvers can now find solutions efficiently to many SYGUS 
problems, there has been effectively no work on the much harder task of proving 
that a given SYGUS problem is unrealizable—i.e., it does not admit a solution. 
For example, consider the SYGUS problem (Yrax2( f, £, Y), G2), where G2 is the 
more restricted grammar with if-then-else operators and conditions stripped out: 


Start := Plus(Start, Start) |a|y|0|1 


This SYGUS problem does not have a solution, because no expression generated 
by G2 meets the specification.! However, to the best of our knowledge, current 
SyGuS solvers cannot prove that such a SYGUS problem is unrealizable.? 

A key property of the previous example is that the grammar is infinite. When 
such a SYGUS problem is realizable, any search technique that systematically 
explores the infinite search space of possible programs will eventually identify a 
solution to the synthesis problem. In contrast, proving that a problem is unre- 
alizable requires showing that every program in the infinite search space fails 
to satisfy the specification. This problem is in general undecidable [6]. Although 
we cannot hope to have an algorithm for establishing unrealizability, the chal- 
lenge is to find a technique that succeeds for the kinds of problems encountered 
in practice. Existing synthesizers can detect the absence of a solution in cer- 
tain cases (e.g., because the grammar is finite, or is infinite but only generate 
a finite number of functionally distinct programs). However, in practice, as our 


1 Grammar G2 only generates terms that are equivalent to some linear function of x 
and y; however, the maximum function cannot be described by a linear function. 
The synthesis problem presented above is one that is generated by a recent tool 
called QSYGuS, which extends SyGuS with quantitative syntactic objectives [10]. 
The advantage of using quantitative objectives in synthesis is that they can be used 
to produce higher-quality solutions—e.g., smaller, more readable, more efficient, etc. 
The synthesis problem (Wmax2(f,2,y),G2) arises from a QSyGUS problem in which 
the goal is to produce an expression that (i) satisfies the specification Wmax2(f, £, y), 
and (ii) uses the smallest possible number of if-then-else operators. Existing SYGuS 
solvers can easily produce a solution that uses one if-then-else operator, but cannot 
prove that no better solution exists—i.e., (Wmax2(f, £, yY), G2) is unrealizable. 
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experiments show, this ability is limited—no existing solver was able to show 
unrealizability for any of the examples considered in this paper. 

In this paper, we present a technique for proving that a possibly infinite 
SyGusS problem is unrealizable. Our technique builds on two ideas. 


1. We observe that unrealizability can often be proven using finitely many input 
examples. In Sect. 2, we show how the example discussed above can be proven 
to be unrealizable using four input examples—(0,0), (0,1), (1,0), and (1,1). 

2. We devise a way to encode a SyGUS problem (w(f,Z),G) over a finite set 
of examples E as a reachability problem in a recursive program P[G, E]. In 
particular, the program that we construct has an assertion that holds if and 
only if the given SYGUS problem is unrealizable. Consequently, unrealizability 
can be proven by establishing that the assertion always holds. This property 
can often be established by a conventional program-analysis tool. 


The encoding mentioned in item 2 is non-trivial for three reasons. The following 
list explains each issue, and sketches how they are addressed 


(1) Infinitely many terms. We need to model the infinitely many terms generated 
by the grammar of a given synthesis problem (w(f, z), G). 

To address this issue, we use non-determinism and recursion, and give an 

encoding P[G, E] such that (i) each non-deterministic path p in the program 
P(G, E] corresponds to a possible expression ep that G can generate, and (ii) for 
each expression e that G can generate, there is a path pe in P[G, E]. (There is 
an isomorphism between paths and the expression-trees of G') 
(2) Nondeterminism. We need the computation performed along each path p 
in P[G, E] to mimic the execution of expression ep. Because the program uses 
non-determinism, we need to make sure that, for a given path p in the program 
P[G, E], computational steps are carried out that mimic the evaluation of ep for 
each of the finitely many example inputs in E. 

We address this issue by threading the expression-evaluation computations 
associated with each example in E through the same non-deterministic choices. 
(3) Complex Specifications. We need to handle specifications that allow for nested 
calls of the programs being synthesized. 

For instance, consider the specification f(f(x)) = x. To handle this specifi- 
cation, we introduce a new variable y and rewrite the specification as f(a) = 
y A f(y) = x. Because y is now also used as an input to f, we will thread both 
the computations of x and y through the non-deterministic recursive program. 

Our work makes the following contributions: 


— We reduce the SYGUS unrealizability problem to a reachability problem to 
which standard program-analysis tools can be applied (Sects. 2 and 4). 

— We observe that, for many SYGUS problems, unrealizability can be proven 
using finitely many input examples, and use this idea to apply the Counter- 
Example-Guided Inductive Synthesis (CEGIS) algorithm to the problem of 
proving unrealizability (Sect. 3). 
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— We give an encoding of a SyGuS problem (~(f,Z),G) over a finite set of 
examples F as a reachability problem in a nondeterministic recursive program 
P|G, E], which has the following property: if a certain assertion in P[G, E] 
always holds, then the synthesis problem is unrealizable (Sect. 4). 

— We implement our technique in a tool NOPE using the ESolver synthesizer [2] 
as the SYGUS solver and the SeaHorn tool [8] for checking reachability. NOPE 
is able to establish unrealizability for 59 out of 132 variants of benchmarks 
taken from the SYGUS competition. In particular, NOPE solves all benchmarks 
with no more than 15 productions in the grammar and requiring no more than 
9 input examples for proving unrealizability. Existing SyGUuS solvers lack the 
ability to prove that these benchmarks are unrealizable, and time out on 
them. 


Section 6 discusses related work. Some additional technical material, proofs, and 
full experimental results are given in [13]. 


2 Illustrative Example 


In this section, we illustrate the main components of our framework for estab- 
lishing the unrealizability of a SYGUS problem. 

Consider the SYGuUS problem to synthesize a function f that computes the 
maximum of two variables x and y, denoted by (Ymax2( f, £, yY), G1). The goal is to 
create es—an expression-tree for f—where ef is in the language of the following 
regular-tree grammar G1: 


Start := Plus(Start, Start) | If[henElse(BExpr, Start, Start) || y|0|1 
BExpr ::= GreaterThan(Start, Start) | Not(BExpr) | And(BExpr, BExpr) 


and Vaz, Y-Ymax2( [ef], z, y) is valid, where [ep] denotes the meaning of ef, and 
Ymaxa(f, x,y) = f(r,y) > £A f(x,y) 2yA(f(z,y) =x V f(x,y) = y). 
SYGUS solvers can easily find a solution, such as 
e := IfThenElse(GreaterThan(z, y), x, y). 


Although many solvers can now find solutions efficiently to many SYGUS 
problems, there has been effectively no work on the much harder task of proving 
that a given SYGUS problem is unrealizable—i.e., it does not admit a solution. 
For example, consider the SYGUS problem (Ymax2( f, £, Y), G2), where G2 is the 
more restricted grammar with if-then-else operators and conditions stripped out: 


Start := Plus(Start, Start) |x |y]|0]1 


This SYGUS problem does not have a solution, because no expression generated 
by G2 meets the specification.” However, to the best of our knowledge, current 


3 Grammar G% generates all linear functions of x and y, and hence generates an infinite 
number of functionally distinct programs; however, the maximum function cannot 
be described by a linear function. 
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SyYGUS solvers cannot prove that such a SYGUS problem is unrealizable. As an 
example, we use the problem (Ymax2(f, £, Y), G2) discussed in Sect. 1, and show 
how unrealizability can be proven using four input examples: (0,0), (0,1), (1,0), 
and (1,1). 


int I_0; 
2 void Start(int x_0,int y_0){ 
if(ndQ){ // Encodes ‘‘Start ::= Plus(Start, Start)?’ 


Start(x_0, y_0); 

int tempL_0O = I_0; 

Start(x_0, y_0); 

int tempR_O = I_0; 

I_O0 = tempL_O + tempR_0O; 
} 


else if(nd()) ILO = x_0; // Encodes ‘‘Start ::= x’? 
else if(mdQ)) ILO = y_0; // Encodes ‘‘Start ::= y’? 
else if(nd()) I_O = 1; // Encodes **Start ¿i= 1°? 
else I_O = 0; // Encodes ‘‘Start ::= 0’? 


} 


bool spec(int x, int y, int f){ 
return (f>=x && f>=y && (f==x || f==y)) 
s } 
» void main(){ 
int x_0 = 0; int y_O = 1; // Input example (0,1) 
Start (x_0,y_0); 


assert (!spec(x_0,y_0,I1_0)); 
} 


Fig. 1. Program P[G2, E1] created during the course of proving the unrealizability of 
(Ynax2( f, £, Y), G2) using the set of input examples FE, = {(0,1)}. 


Our method can be seen as a variant of Counter-Example-Guided Inductive 
Synthesis (CEGIS), in which the goal is to create a program P in which a 
certain assertion always holds. Until such a program is created, each round of 
the algorithm returns a counter-example, from which we extract an additional 
input example for the original SyYGuS problem. On the it? round, the current 
set of input examples E; is used, together with the grammar—in this case Ga— 
and the specification of the desired behavior—Wypaxa(f, £, y), to create a candidate 
program P[G2, E;|. The program P[G2, E;] contains an assertion, and a standard 
program analyzer is used to check whether the assertion always holds. 

Suppose that for the SYGUS problem (Ymax2(f, £, y), G2) we start with just 
the one example input (0, 1)—i.e., Æı = {(0,1)}. Figure 1 shows the initial pro- 
gram P|[G2, E1] that our method creates. The function spec implements the 
predicate Wmaxo(f, £, y). (All of the programs {P[G2, E;]} use the same func- 
tion spec). The initialization statements “int x_0 = 0; int y_0 = 1;” at line 
(21) in procedure main correspond to the input example (0,1). The recur- 
sive procedure Start encodes the productions of grammar G2. Start is non- 
deterministic; it contains four calls to an external function nd(), which returns 
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a non-deterministically chosen Boolean value. The calls to nd() can be under- 
stood as controlling whether or not a production is selected from G2 during a 
top-down, left-to-right generation of an expression-tree: lines (3)—(8) correspond 
to “Start := Plus(Start, Start),” and lines (10), (11), (12), and (13) correspond 
to “Start := x,” “Start := y,” “Start := 1,” and “Start ::= 0,” respectively. 
The code in the five cases in the body of Start encodes the semantics of the 
respective production of G2; in particular, the statements that are executed 
along any execution path of P[G2, E,| implement the bottom-up evaluation of 
some expression-tree that can be generated by G2. For instance, consider the 
path that visits statements in the following order (for brevity, some statement 
numbers have been elided): 


21 22 (start 3 4 (start 10 )start 6 (start 12 )start 8 )start 23, (1) 


where (start and )start indicate entry to, and return from, procedure Start, 
respectively. Path (1) corresponds to the top-down, left-to-right generation of 
the expression-tree Plus (x, 1), interleaved with the tree’s bottom-up evaluation. 

Note that with path (1), when control returns to main, variable I_0 has the 
value 1, and thus the assertion at line (23) fails. 

A sound program analyzer will discover that some such path exists in the 
program, and will return the sequence of non-deterministic choices required to 
follow one such path. Suppose that the analyzer chooses to report path (1); the 
sequence of choices would be t, f,t, f, f, f,t, which can be decoded to create the 
expression-tree Plus(x,1). At this point, we have a candidate definition for f: 
f =«x+1. This formula can be checked using an SMT solver to see whether it 
satisfies the behavioral specification Ymax2( f, x,y). In this case, the SMT solver 
returns “false.” One counter-example that it could return is (0,0). 

At this point, program P[G2, E2] would be constructed using both of the 
example inputs (0, 1) and (0,0). Rather than describe P[G2, E2], we will describe 
the final program constructed, P[G2, E4] (see Fig. 2). 

As can be seen from the comments in the two programs, program P[G2, E4] 
has the same basic structure as P[G2, E1]. 


— main begins with initialization statements for the four example inputs. 
— Start has five cases that correspond to the five productions of Go. 


The main difference is that because the encoding of Gg in Start uses non- 
determinism, we need to make sure that along each path p in P[G, E4], each of 
the example inputs is used to evaluate the same expression-tree. We address this 
issue by threading the expression-evaluation computations associated with each 
of the example inputs through the same non-deterministic choices. That is, each 
of the five “production cases” in Start has four encodings of the production’s 
semantics—one for each of the four expression evaluations. By this means, the 
statements that are executed along path p perform four simultaneous bottom-up 
evaluations of the expression-tree from Ga that corresponds to p. 

Programs P[G2, E2] and P[G2, Es] are similar to P[G2, £4], but their paths 
carry out two and three simultaneous bottom-up evaluations, respectively. The 
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int I 0; I_1, I 


23 
2 void Start(int x_0,int y_O,... 
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T_33 
,int x_3,int y_3){ 


if(mdO){ // Encodes ‘‘Start ::= Plus(Start, Start)?’ 
Start(x_0, y_0, x_1, y_1, x_2, y_2, x_3, y_3); 
int tempL_O = I_0; int tempL_1 = I_1; 
int tempL_2 = I_2; int tempL_3 = I_3; 
Start (x-0; yO, x1, yul, x22, yod, x-3; y8); 
int tempR_O = I_0; int tempR_1 = I_1; 
int tempR_2 = I_2; int tempR_3 = I_3; 
ILO = tempL_O + tempR_0; 
I_1 = tempL_1 + tempR_1; 
I_2 = tempL_2 + tempR_2; 
I_3 = tempL_3 + tempR_3;} 
else if(nd()) { // Encodes ‘‘Start ::= x’? 
I_O = x_0; I_1 = x_1; I_2 = x_2; I_3 = x_3;} 
else if(nd()) { // Encodes ‘‘Start ::= y’? 
7 I_O = y_0; I_1 = y_1; I_2 = y_2; I_3 = y_3;} 
18 else if(nd()) { // Encodes ‘‘Start ::= 1°? 
19 I_0 = 1; I_1 = 1; I_2 = 1; I_3 = 1;} 
20 else { // Encodes ‘*Start ::= 0?? 
21 I_0 = 0; I.i = 0; I_2 = 0; I_3 = 0;} 
22 } 
1 bool spec(int x, int y, int f){ 
return (f>=x && f>=y && (f==x || f==y)) 
26 } 
s void main(){ 
29 int x_0 = 0; int y_0 = 1; // Input example (0,1) 
) int x_1 = 0; int y_1 = 0; // Input example (0,0) 
int x_2 = 1; int y_2 = 1; // Input example (1,1) 
int x_3 = 1; int y_3 = 0; // Input example (1,0) 
Start (x_0,y_0,x_1l,y_1,x_2,y_2,x_3,y_3); 
assert ( !'spec(x_0,y_0,I1_0) || !spec(x_1,y_1,1_1) 
Il !spec(x_2,y_2,1_2) |I| !spec(x_3,y_3,1_3)); 


Fig. 2. Program P[G2, E4] created during the course of proving the unrealizability of 
(Wmax2(f, £, Y), G2) using the set of input examples E4 = {(0, 0), (0,1), (1,0), (1, 1)}. 


actions taken during rounds 2 and 3 to generate a new counter-example—and 
hence a new example input—are similar to what was described for round 1. On 
round 4, however, the program analyzer will determine that the assertion on lines 
(34)—(35) always holds, which means that there is no path through P[G2, Æ4] 
for which the behavioral specification holds for all of the input examples. This 
property means that there is no expression-tree that satisfies the specification— 
i.e., the SYGUS problem (Ymax2( f, £, yY), G2) is unrealizable. 

Our implementation uses the program-analysis tool SEAHORN [|8] as the 
assertion checker. In the case of P[G2, E4], SEAHORN takes only 0.5s to establish 
that the assertion in P[G'2, E4] always holds. 
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3 SyGuS, Realizability, and CEGIS 


3.1 Background 


Trees and Tree Grammars. A ranked alphabet is a tuple (X, rks) where X is a 
finite set of symbols and rks : X — N associates a rank to each symbol. For 
every m > 0, the set of all symbols in X with rank m is denoted by 5°”. In 
our examples, a ranked alphabet is specified by showing the set X and attaching 
the respective rank to every symbol as a superscript—e.g., X = {+ , c0}. (For 
brevity, the superscript is sometimes omitted). We use Ts to denote the set of all 
(ranked) trees over X—i.e., Ts is the smallest set such that (i) 5© C Ty, (ii) if 
o) e OC’) and ti,- -tk E€ Ts, then o (ty, --- tk) € Ty. In what follows, we 
assume a fixed ranked alphabet (X, rks). 

In this paper, we focus on typed regular tree grammars, in which each non- 
terminal and each symbol is associated with a type. There is a finite set of types 
{t1,..., Tg}. Associated with each symbol o € 3, there is a type assignment 
as) = (To, T1,---,7;), Where To is called the left-hand-side type and 71,...,7; are 
called the right-hand-side types. Tree grammars are similar to word grammars, 
but generate trees over a ranked alphabet instead of words. 


Definition 1 (Regular Tree Grammar). A typed regular tree grammar 
(RTG) is a tuple G = (N, X, S,a,ô), where N is a finite set of non-terminal 
symbols of arity 0; X is a ranked alphabet; S E€ N is an initial non-terminal; a 
is a type assignment that gives types for members of X U N; and ô is a finite 
set of productions of the form Ag —> o)(Aj,..., Ai), where for 1 < j < i, each 
A; E N is a non-terminal such that if a(o) = (To, T1, ..., Ti) then a(Aj) = Tj. 


In a SYGuS problem, each variable, such as x and y in the example RTGs 
in Sect. 1, is treated as an arity-0 symbol—i.e., 2 and y. 

Given a tree t € Tyun, applying a production r = A — £ to t produces 
the tree ¢’ resulting from replacing the left-most occurrence of A in t with the 
right-hand side 8. A tree t € Ty is generated by the grammar G—denoted by 
t € L(G)—iff it can be obtained by applying a sequence of productions 11 +++ rp 
to the tree whose root is the initial non-terminal S. 


Syntaxz-Guided Synthesis. A SYGUS problem is specified with respect to a back- 
ground theory T—e.g., linear arithmetic—and the goal is to synthesize a function 
f that satisfies two constraints provided by the user. The first constraint, w(f, ©), 
describes a semantic property that f should satisfy. The second constraint limits 
the search space S of f, and is given as a set of expressions specified by an RTG 
G that defines a subset of all terms in T. 


Definition 2 (SyGuUS). A SyGuS problem over a background theory T is a 
pair sy = (W(f,%),G) where G is a regular tree grammar that only contains 
terms in T—i.e., L(G) C T—and Y(f, ©) is a Boolean formula constraining the 
semantic behavior of the synthesized program f. 

A SYGUS problem is realizable if there exists a expression e € L(G) such 
that Vi.w([e], Z) is true. Otherwise we say that the problem is unrealizable. 
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Theorem 1 (Undecidability [6]). Given a SYGUS problem sy, it is undecid- 
able to check whether sy is realizable. 


Counterexample-Guided Inductive Synthesis. The Counterexample-Guided 
Inductive Synthesis (CEGIS) algorithm is a popular approach to solving syn- 
thesis problems. Instead of directly looking for an expression that satisfies the 
specification y on all possible inputs, the CEGIS algorithm uses a synthesizer 
S that can find expressions that are correct on a finite set of examples E. If S 
finds a solution that is correct on all elements of FE, CEGIS uses a verifier V 
to check whether the discovered solution is also correct for all possible inputs to 
the problem. If not, a counterexample obtained from V is added to the set of 
examples, and the process repeats. More formally, CEGIS starts with an empty 
set of examples E and repeats the following steps: 


1. Call the synthesizer S to find an expression e such that ”([e],Z) = Vz € 
E.1)([e], z) holds and go to step 2; return unrealizable if no expression exists. 

2. Call the verifier V to find a model c for the formula —7)([e], z), and add c to 
the counterexample set E; return e as a valid solution if no model is found. 


Because SYGUS problems are only defined over first-order decidable theories, 
any SMT solver can be used as the verifier V to check whether the formula 
—1)(e], z) is satisfiable. On the other hand, providing a synthesizer S to find 
solutions such that Yz € E.~({e],Z) holds is a much harder problem because 
e is a second-order term drawn from an infinite search space. In fact, checking 
whether such an e exists is an undecidable problem [6]. 

The main contribution of our paper is a reduction of the unrealizability 
problem—i.e., the problem of proving that there is no expression e € L(G) 
such that Yz € E.~([e],Z) holds—to an unreachability problem (Sect. 4). This 
reduction allows us to use existing (un)reachability verifiers to check whether a 
SyYGUS instance is unrealizable. 


3.2 CEGIS and Unrealizability 


The CEGIS algorithm is sound but incomplete for proving unrealizability. Given 
a SYGUS problem sy = (w(f,Z),G) and a finite set of inputs Æ, we denote with 
sy” := (YE (f,T), G) the corresponding SyGuS problem that only requires the 
function f to be correct on the examples in E. 


Lemma 1 (Soundness). If sy” is unrealizable then sy is unrealizable. 


Even when given a perfect synthesizer S—i.e., one that can solve a problem 
sy” for every possible set E—there are SyGuS problems for which the CEGIS 
algorithm is not powerful enough to prove unrealizability. 


Lemma 2 (Incompleteness). There exists an unrealizable SYGUS problem 
sy such that for every finite set of examples E the problem sy” is realizable. 


Despite this negative result, we will show that a CEGIS algorithm can prove 
unrealizability for many SyGUS instances (Sect. 5). 
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4 From Unrealizability to Unreachability 


In this section, we show how a SYGUS problem for finitely many examples can 
be reduced to a reachability problem in a non-deterministic, recursive program 
in an imperative programming language. 


4.1 Reachability Problems 


A program P takes an initial state J as input and outputs a final state O, 
i.e., [P](J) = O where [-] denotes the semantic function of the programming 
language. As illustrated in Sect. 2, we allow a program to contain calls to an 
external function nd(), which returns a non-deterministically chosen Boolean 
value. When program P contains calls to nd(), we use P to denote the program 
that is the same as P except that P takes an additional integer input n, and each 
call nd() is replaced by a call to a local function nextbit() defined as follows: 


bool nextbit(){bool b = n42; n=n>1; return b;}. 


In other words, the integer parameter n of Ên] formalizes all of the non- 
deterministic choices made by P in calls to nd (). 

For the programs P[G, E] used in our unrealizability algorithm, the only 
calls to nd() are ones that control whether or not a production is selected from 
grammar G during a top-down, left-to-right generation of an expression-tree. 
Given n, we can decode it to identify which expression-tree n represents. 


Example 1. Consider again the SYGUS problem (naxo(f, £, Y), G2) discussed in 
Sect. 2. In the discussion of the initial program P[G2, E1] (Fig. 1), we hypoth- 
esized that the program analyzer chose to report path (1) in P, for which the 
sequence of non-deterministic choices is t, f,t, f, f, f,t. That sequence means that 
for P{n], the value of n is 1000101 (base 2) (or 69 (base 10)). The 1s, from low- 
order to high-order position, represent choices of production instances in a top- 
down, left-to-right generation of an expression-tree. (The Os represent rejected 
possible choices). The rightmost 1 in n corresponds to the choice in line (3) of 
“Start ::= Plus(Start, Start)”; the 1 in the third-from-rightmost position 
corresponds to the choice in line (10) of “Start ::= x” as the left child of the 
Plus node; and the 1 in the leftmost position corresponds to the choice in line 
(12) of “Start ::= 1” as the right child. By this means, we learn that the 
behavioral specification Ymax2( f, x, y) holds for the example set E, = {(0,1)} for 
ft Plus(x,1). 


Definition 3 (Reachability Problem). Given a program P[n], containing 
assertion statements and a non-deterministic integer input n, we use rep to 
denote the corresponding reachability problem. The reachability problem rep is 
satisfiable if there exists a value n that, when bound to n, falsifies any of the 
assertions in P{n]. The problem is unsatisfiable otherwise. 
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4.2 Reduction to Reachability 


The main component of our framework is an encoding enc that given a SYGUS 
problem sy” = (yE (f, x), G) over a set of examples E = {c1,..., Ck}, outputs 
a program P[G,E] such that sy” is realizable if and only if TEeenc(sy, E) İS 
satisfiable. In this section, we define all the components of P[G, E], and state 
the correctness properties of our reduction. 

Remark: In this section, we assume that in the specification Y%( f, x) every occur- 
rence of f has x as input parameter. We show how to overcome this restriction 
in App. A [13]. In the following, we assume that the input x has type Tr, where 
Tr could be a complex type—e.g., a tuple type. 

Program Construction. Recall that the grammar G is a tuple (N, X, S, a, ô). First, 
for each non-terminal A € N, the program P[G, E] contains k global variables 
{g_1_A,...,g_k_A} of type a(A) that are used to express the values resulting 
from evaluating expressions generated from non-terminal A on the k examples. 
Second, for each non-terminal A € N, the program P[G,, E] contains a function 


void funcA(7; vi,...,77 vk){ bodyA } 


We denote by 6(A) = {ri,...,7m} the set of production rules of the form 
A — (in ô. The body bodyA of funcA has the following structure: 


if(nd()) {Ens(r1)} 
else if(nd()) {Enj(r2)} 


wide {En5(1m)} 


The encoding En;(r) of a production r = Ag — bO) (A1, -++ , Aj) is defined 
as follows (7; denotes the type of the term Aj): 


funcAi(vi,...,vk); 
Tı cChild_1_1 = g_1_A1;...;7, child_1_k = g_k_Aj; 


funcAj(v1,...,vk); 
Tj Child_j_1 = g_1_Al;...;7; child_j_k = g_k_Aj; 
g_1_AO = enc;(child_1_1,...,child_1_k) 


g_k_AO = enc#(child_j_i,...,child_j_k) 


Note that if bY is of arity 0—i.e., if j = 0—the construction yields k assignments 
of the form g_m_AO = encp (). 

The function enc? interprets the semantics of b on the m“ input example. 
We take Linear Integer Arithmetic as an example to illustrate how enc’ works. 


h 


ENCHio) = O enco = 1 
enc™o) = vi ENC quas) (Ls R) := (L=R) 
m i m a 
enc, (2) (L, R) := L+R ENC inas (L R) := L-R 


ENC erhenElse(s) (B: L, R) = if(B) L else R 
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We now turn to the correctness of the construction. First, we formalize the 
relationship between expression-trees in L(G), the semantics of P[G, E], and 
the number n. Given an expression-tree e, we assume that each node q in e is 
annotated with the production that has produced that node. Recall that 6(A) = 
{r1,..., fm} is the set of productions with head A (where the subscripts are 
indexes in some arbitrary, but fixed order). Concretely, for every node q, we 
assume there is a function pr(q) = (A,7), which associates q with a pair that 
indicates that non-terminal A produced n using the production r; (i.e., r; is the 
itb production whose left-hand-side non-terminal is A). 

We now define how we can extract a number #(e) for which the program 
P[#(e)] will exhibit the same semantics as that of the expression-tree e. First, 
for every node q in e such that pr(q) = (A, i), we define the following number: 


10:--Q if i <|6(A)| 
“oo 
_ i—1 
#nd(Q) = J 0.0 if i = |5(A)]. 
“so 


i-1 


The number #na(q) indicates what suffix of the value of n will cause funcA to 
trigger the code corresponding to production r;. Let q,---G@m be the sequence of 
nodes visited during a pre-order traversal of expression-tree e. The number corre- 
sponding to e, denoted by #(e), is defined as the bit-vector #na(@m) +++ #nalqı). 
Finally, we add the entry-point of the program, which calls the function funcS 
corresponding to the initial non-terminal S, and contains the assertion that 
encodes our reachability problem on all the input examples E = {c1,...,cx}. 


void main() { 
Tr XÍ = Cy5°°°577 XK = Ck; 
funcS(x1,..., xk); 
assert Vi<i<ck 7U(f,c)[g_i_S/f(z)]; // At least one c; fails } 


Correctness. We first need to show that the function #(-) captures the correct 
language of expression-trees. Given a non-terminal A, a value n, and input values 


i1,..., İk, we use [funcA[n]](41,...,7%) = (01,...0%) to denote the values of the 
variables {g_1_A,...,g_k_A} at the end of the execution of funcA[n] with the 
initial value of n = n and input values z1,..., £k. Given a non-terminal A, we 


write L(G, A) to denote the set of terms that can be derived starting with A. 


Lemma 3. Let A be a non-terminal, e € L(G, A) an expression, and {i1,..., ik} 
an input set. Then, ([e](t1),.--, fe] (¢x)) = [func4[#(e)]] (41, ---, ik). 
Each procedure funcAln](i1,...,%,) that we construct has an explicit depen- 


dence on variable n, where n controls the non-deterministic choices made by the 
funcA and procedures called by funcA. As a consequence, when relating numbers 
and expression-trees, there are two additional issues to contend with: 


Non-termination. Some numbers can cause funcA[n] to fail to terminate. 
For instance, if the case for “Start ::= Plus(Start, Start)” in program 
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P|G, E1] from Fig. 1 were moved from the first branch (lines (3)—(8)) to the 
final else case (line (13)), the number n = 0 = ...0000000 (base 2) would 
cause Start to never terminate, due to repeated selections of Plus nodes. 
However, note that the only assert statement in the program is placed at the 
end of the main procedure. Now, consider a value of n such that réene(sy,B) iS 
satisfiable. Definition 3 implies that the flow of control will reach and falsify 
the assertion, which implies that funcA[n] terminates.4 

Shared suffixes of sufficient length. In Example 1, we showed how for pro- 
gram P[G2, E1] (Fig. 1) the number n = 1000101 (base 2) corresponds to the 
top-down, left-to-right generation of Plus(x,1). That derivation consumed 
exactly seven bits; thus, any number that, written in base 2, shares the suffix 
1000101—e.g., 11010101000101—-will also generate Plus (x,1). 


The issue of shared suffixes is addressed in the following lemma: 


Lemma 4. For every non-terminal A and number n such that 
[funcA[n]](t1,.--,%%) A L (ie., funcd terminates when the non-deterministic 
choices are controlled by n), there exists a minimal n that is a (base 2) suffix 
of n for which (i) there is an e € L(G) such that #(e) = n', and (ii) for every 
input {i1,..., ik}, we have [funcd[n]](t1,...,%%) = [funcd[n’]] (a1, ..., tn). 


We are now ready to state the correctness property of our construction. 


Theorem 2. Given a SYGUS problem sy” = (We(f,x),G) over a finite set of 
examples E, the problem sy” is realizable iff T€enc(sy,B) 18 satisfiable. 


5 Implementation and Evaluation 


NOPE is a tool that can return two-sided answers to unrealizability problems of 
the form sy = (4, G). When it returns unrealizable, no expression-tree in L(G) 
satisfies Y; when it returns realizable, some e € L(G) satisfies Y; NOPE can also 
time out. NOPE incorporates several existing pieces of software. 


1. The (un)reachability verifier SEAHORN is applied to the reachability problems 
of the form reene(sy,8) created during successive CEGIS rounds. 

2. The SMT solver Z3 is used to check whether a generated expression-tree e 
satisfies w. If it does, NOPE returns realizable (along with e); if it does not, 
NOPE creates a new input example to add to E. 


It is important to observe that SEAHORN, like most reachability verifiers, is 
only sound for unsatisfiability—i.e., if SEAHORN returns unsatisfiable, the 
reachability problem is indeed unsatisfiable. Fortunately, SEAHORN’s one-sided 


4 If the SyGuS problem deals with the synthesis of programs for a language that 
can express non-terminating programs, that would be an additional source of non- 
termination, different from that discussed in item Non-termination. That issue 
does not arise for LIA SyGuS problems. Dealing with the more general kind of 
non-termination is postponed for future work. 
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answers are in the correct direction for our application: to prove unrealizability, 
NOPE only requires the reachability verifier to be sound for unsatisfiability. 

There is one aspect of NOPE that differs from the technique that has been 
presented earlier in the paper. While SEAHORN is sound for unreachability, it 
is not sound for reachability—i.e., it cannot soundly prove whether a synthesis 
problem is realizable. To address this problem, to check whether a given SYGUS 
problem sy” is realizable on the finite set of examples E, NOPE also calls the 
SyYGUS solver ESolver [2] to synthesize an expression-tree e that satisfies sy”.° 

In practice, for every intermediate problem sy” generated by the CEGIS 
algorithm, NOPE runs the ESolver on sy” and SEAHORN on T€enc(sy,E) IN par- 
allel. If ESolver returns a solution e, SEAHORN is interrupted, and Z3 is used 
to check whether e satisfies ~. Depending on the outcome, NOPE either returns 
realizable or obtains an additional input example to add to E. If SEAHORN 
returns unsatisfiable, NOPE returns unrealizable. 

Modulo bugs in its constituent components, NOPE is sound for both realiz- 
ability and unrealizability, but because of Lemma 2 and the incompleteness of 
SEAHORN, NOPE is not complete for unrealizability. 


Benchmarks. We perform our evaluation on 132 variants of the 60 LIA bench- 
marks from the LIA SyGuS competition track [2]. We do not consider the other 
SyYGusS benchmark track, Bit-Vectors, because the SEAHORN verifier is unsound 
for most bit-vector operations—e.g., bit-shifting. We used three suites of bench- 
marks. LIMITEDIF (resp. LIMITEDPLUS) contains 57 (resp. 30) benchmarks in 
which the grammar bounds the number of times an IfThenElse (resp. Plus) 
operator can appear in an expression-tree to be 1 less than the number required 
to solve the original synthesis problem. We used the tool QUASI to automati- 
cally generate the restricted grammars. LIMITEDCONST contains 45 benchmarks 
in which the grammar allows the program to contain only constants that are 
coprime to any constants that may appear in a valid solution—e.g., the solution 
requires using odd numbers, but the grammar only contains the constant 2. The 
numbers of benchmarks in the three suites differ because for certain benchmarks 
it did not make sense to create a limited variant—e.g., if the smallest program 
consistent with the specification contains no IfThenElse operators, no variant 
is created for the LIMITEDIF benchmark. In all our benchmarks, the grammars 
describing the search space contain infinitely many terms. 

Our experiments were performed on an Intel Core i7 4.00GHz CPU, with 
32 GB of RAM, running Lubuntu 18.10 via VirtualBox. We used version 4.8 of 
Z3, commit 97f2334 of SEAHORN, and commit d37c50e of ESolver. The timeout 
for each individual SEAHORN/ESolver call is set at 10 min. 


Experimental Questions. Our experiments were designed to answer the ques- 
tions posed below. 


EQ 1. Can NOPE prove unrealizability for variants of real SyGUS bench- 
marks, and how long does it take to do so? 


5 We chose ESolver because on the benchmarks we considered, ESolver outperformed 
other SYGUS solvers (e.g., CVC4 [3]). 
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Finding: NOPEcan prove unrealizability for 59/132 benchmarks. For the 59 
benchmarks solved by NOPE, the average time taken is 15.59 s. The time taken 
to perform the last iteration of the algorithm—i.e., the time taken by SEAHORN 
to return unsatisfiable—accounts for 87% of the total running time. 

NOPE can solve all of the LIMITEDIF benchmarks for which the grammar 
allows at most one IfThenElse operator. Allowing more IfThenElse operators in 
the grammar leads to larger programs and larger sets of examples, and conse- 
quently the resulting reachability problems are harder to solve for SEAHORN. 

For a similar reason, NOPE can solve only one of the LIMITEDPLUS bench- 
marks. All other LIMITEDPLUS benchmarks allow 5 or more Plus statements, 
resulting in grammars that have at least 130 productions. 

NOPE can solve all LIMITEDCONST benchmarks because these require few 
examples and result in small encoded programs. 


EQ 2. How many examples does NOPE use to prove unrealizability and how 
does the number of examples affect the performance of NOPE? 


Note that Z3 can produce different models for the same query, and thus different 
runs of NOPE can produce different sequences of examples. Hence, there is no 
guarantee that NOPE finds a good sequence of examples that prove unrealiz- 
ability. One measure of success is whether NOPE is generally able to find a small 
number of examples, when it succeeds in proving unrealizability. 

Finding: Nope used 1 to 9 examples to prove unrealizability for the bench- 
marks on which it terminated. Problems requiring large numbers of examples 
could not be solved because either ESolver or times out—e.g., on the problem 
max4, NOPE gets to the point where the CEGIS loop has generated 17 examples, 
at which point ESolver exceeds the timeout threshold. 


Finding: The number of examples required to ‘ T jl 
prove unrealizability depends mainly on the arity of ®© ae: 7 
the synthesized function and the complexity of the x 10° E A 
grammar. The number of examples seems to grow # 10° E ; 8 : 
quadratically with the number of bounded opera- 1071 = 


tors allowed in the grammar. In particular, prob- =e oa 
lems in which the grammar allows zero IfThenElse 
operators require 2—4 examples, while problems in 
which the grammar allows one If[henElse operator 
require 7-9 examples. 

Figure3 plots the running time of NOPE against the number of examples 
generated by the CEGIS algorithm. Finding: The solving time appears to grow 
exponentially with the number of examples required to prove unrealizability. 


examples 


Fig. 3. Time vs examples. 


6 Related Work 


The SYGUuS formalism was introduced as a unifying framework to express several 
synthesis problems [1]. Caulfield et al. [6] proved that it is undecidable to deter- 
mine whether a given SYGUS problem is realizable. Despite this negative result, 
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there are several SYGUS solvers that compete in yearly SYGUS competitions [2] 
and can efficiently produce solutions to SYGUS problems when a solution exists. 
Existing SYGUS synthesizers fall into three categories: (i) Enumeration solvers 
enumerate programs with respect to a given total order [7]. If the given prob- 
lem is unrealizable, these solvers typically only terminate if the language of the 
grammar is finite or contains finitely many functionally distinct programs. While 
in principle certain enumeration solvers can prune infinite portions of the search 
space, none of these solvers could prove unrealizability for any of the benchmarks 
considered in this paper. (ii) Symbolic solvers reduce the synthesis problem to 
a constraint-solving problem [3]. These solvers cannot reason about grammars 
that restrict allowed terms, and resort to enumeration whenever the candidate 
solution produced by the constraint solver is not in the restricted search space. 
Hence, they also cannot prove unrealizability. (iii) Probabilistic synthesizers ran- 
domly search the search space, and are typically unpredictable [14], providing 
no guarantees in terms of unrealizability. 

Synthesis as Reachability. CETI [12] introduces a technique for encoding 
template-based synthesis problems as reachability problems. The CETI encod- 
ing only applies to the specific setting in which (i) the search space is described 
by an imperative program with a finite number of holes—i.e., the values that the 
synthesizer has to discover—and (ii) the specification is given as a finite number 
of input-output test cases with which the target program should agree. Because 
the number of holes is finite, and all holes correspond to values (and not terms), 
the reduction to a reachability problem only involves making the holes global 
variables in the program (and no more elaborate transformations). 

In contrast, our reduction technique handles search spaces that are described 
by a grammar, which in general consist of an infinite set of terms (not just val- 
ues). Due to this added complexity, our encoding has to account for (i) the seman- 
tics of the productions in the grammar, and (ii) the use of non-determinism to 
encode the choice of grammar productions. Our encoding creates one expression- 
evaluation computation for each of the example inputs, and threads these com- 
putations through the program so that each expression-evaluation computation 
makes use of the same set of non-deterministic choices. 

Using the input-threading, our technique can handle specifications that con- 
tain nested calls of the synthesized program (e.g., f(f(a)) = x). (App. A [13]). 

The input-threading technique builds a product program that performs mul- 

tiple executions of the same function as done in relational program verification 
[4]. Alternatively, a different encoding could use multiple function invocations 
on individual inputs and require the verifier to thread the same bit-stream for 
all input evaluations. In general, verifiers perform much better on product pro- 
grams [4], which motivates our choice of encoding. 
Unrealizability in Program Synthesis. For certain synthesis problems—e.g., reac- 
tive synthesis [5|—the realizability problem is decidable. The framework tackled 
in this paper, SYGUS, is orthogonal to such problems, and it is undecidable to 
check whether a given SyGUS problem is realizable [6]. 
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Mechtaev et al. [11] propose to use a variant of SYGUS to efficiently prune 
irrelevant paths in a symbolic-execution engine. In their approach, for each path 
T in the program, a synthesis problem p, is generated so that if p, is unrealizable, 
the path 7 is infeasible. The synthesis problems generated by Mechtaev et al. 
(which are not directly expressible in SyGuS) are decidable because the search 
space is defined by a finite set of templates, and the synthesis problem can be 
encoded by an SMT formula. To the best of our knowledge, our technique is the 
first one that can check unrealizability of general SyGUS problems in which the 
search space is an infinite set of functionally distinct terms. 
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use, you will need to obtain permission directly from the copyright holder. 
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Abstract. We present DARTAGNAN, a bounded model checker (BMC) 
for concurrent programs under weak memory models. Its distinguishing 
feature is that the memory model is not implemented inside the tool 
but taken as part of the input. DARTAGNAN reads CAT, the standard 
language for memory models, which allows to define x86/TSO, ARMv7, 
ARMv8, Power, C/C++, and Linux kernel concurrency primitives. 
BMC with memory models as inputs is challenging. One has to encode 
into SMT not only the program but also its semantics as defined by 
the memory model. What makes DARTAGNAN scale is its relation anal- 
ysis, a novel static analysis that significantly reduces the size of the 
encoding. DARTAGNAN matches or even exceeds the performance of the 
model-specific verification tools NIDHUGG and CBMC, as well as the 
performance of HERD, a CAT-compatible litmus testing tool. Compared 
to the unoptimized encoding, the speed-up is often more than two orders 
of magnitude. 


Keywords: Weak memory models - CAT - Concurrency - BMC - SMT 


1 Introduction 


When developing concurrency libraries or operating system kernels, performance 
and scalability of the concurrency primitives is of paramount importance. These 
primitives rely on the synchronization guarantees of the underlying hardware 
and the programming language runtime environment. The formal semantics of 
these guarantees are often defined in terms of weak memory models. There is 
considerable interest in verification tools that take memory models into account 
(5,9, ba, 22]. 

A successful approach to formalizing weak memory models is CAT [11, 12, 16], 
a flexible specification language in which all memory models considered so far can 
be expressed succinctly. CAT, together with its accompanying tool HERD [4], 
© The Author(s) 2019 
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has been used to formalize the semantics not only of assembly for x86/TSO, 
POWER, ARMv7 and ARMv8, but also high-level programming languages, such 
as C/C++, transactional memory extensions, and recently the LINUX kernel 
concurrency primitives [11,15,16,18,20,24,29]. This success indicates the need 
for universal verification tools that are not limited to a specific memory model. 

We present DARTAGNAN [3], a bounded model checker that takes memory 
models as inputs. DARTAGNAN expects a concurrent program annotated with an 
assertion and a memory model for which the verification should be conducted. It 
verifies the assertion on those executions of the program that are valid under the 
given memory model and returns a counterexample execution if the verification 
fails. As is typical of BMC, the verification results hold relative to an unrolling 
bound [21]. The encoding phase, however, is new. Not only the program but also 
its semantics as defined by the CAT model are translated into an SMT formula. 

Having to take into account the semantics quickly leads to large encodings. 
To overcome this problem, DARTAGNAN implements a novel relation analysis, 
which can be understood as a static analysis of the program semantics as defined 
by the memory model. More precisely, CAT defines the program semantics in 
terms of relations between the events that may occur in an execution. Depending 
on constraints over these relations, an execution is considered valid or invalid. 
Relation analysis determines the pairs of events that may influence a constraint of 
the memory model. Any remaining pair can be dropped from the encoding. The 
analysis is compatible with optimized fixpoint encodings presented in [27, 28]. 

The second novelty is the support for advanced programming constructs. 
We redesigned DARTAGNAN’s heap model, which now has pointers and arrays. 
Furthermore, we enriched the set of synchronization primitives, including read- 
modify-write and read-copy-update (RCU) instructions [26]. One motivation for 
this richer set of programming constructs is the Linux kernel memory model [15] 
that has recently been added to the kernel documentation [2]. This model has 
already been used by kernel developers to find bugs in and clarify details of the 
concurrency primitives. Since the model is expected to be refined with further 
development of the kernel, verification tools will need to quickly accommodate 
updates in the specification. So far, only HERD [4] has satisfied this require- 
ment. Unfortunately, it is limited to fairly small programs (litmus tests). The 
present version of DARTAGNAN offers an alternative with substantially better 
performance. 

We present experiments on a series of benchmarks consisting of 4751 LINUX 
litmus tests and 7 mutual exclusion algorithms executed on TSO, ARM, and 
Linux. Despite the flexibility of taking memory models as inputs, DARTAGNAN’s 
performance is comparable to CBMC [13] and considerably better than that of 
NIDHUGG [5,9]. Both are model-specific tools. Compared to the previous version 
of DARTAGNAN [28] and compared to HERD [4], we gain a speed-up of more than 
two orders of magnitude, thanks to the relation analysis. 


Related Work. In terms of the verification task to be solved, the following 
tools are the closest to ours. CBMC [13] is a scalable bounded model checker 
supporting TSO, but not ARM. An earlier version also supported POWER. 
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NIDHUGG [5,9] is a stateless model checker supporting TSO, POWER, and a 
subset of ARMv7. It is excellent for programs with a small number of executions. 
RCMC [22] implements a stateless model checking algorithm targeting C11. 
We cannot directly benchmark against it because the source code of the tool 
is not yet publicly available, nor do we fully support C11. HERD [4] is the 
only tool aside from ours that takes a CAT memory model as input. HERD 
does not scale well to programs with a large number of executions, including 
some of the LINUX kernel tests. Other verification tasks (e.g., fence insertion to 
restore sequential consistency) are tackled by MEMORAX [6-8], OFFENCE [14], 
FENDER [23], DFENCE [25], and TRENCHER [19]. 


Relation Analysis on an Example. Consider the program (in the .litmus 
format) given to the left in the figure below. The assertion asks whether there 
is a reachable state with final values EBX = 1, ECX = 0. We analyze the program 
under the x86-TSO memory model shown below the program. The semantics of 
the program under TSO is a set of executions. An execution is a graph, similar 
to the one given below, where the nodes are events and the edges correspond to 
the relations defined by the memory model. Events are instances of instructions 
that access the shared memory: R (loads), W (stores, including initial stores), 
and M (the union of both). The atomic exchange instruction xchg [x], EAX gives 
rise to a pair of read and write events related by a (dashed) rmw edge. Such 
reads and writes belong to the set A of atomic read-modify-write events. 


„te 


f: Wnts = 0 9g: Winey =O aisan rfe 
s e : : i 

em a iig : co seni? ai P d: Ry 

i = j : oe : \rmw I 
{x = 0; y = 0; PO:EAX = 1;} i 4 po-tso: fri g (pete 
PO | P1 ; is ea ny 2 
xchg [x], EAX | mov EBX, [y] ; rare > b:War=1 e: Rr 8 
mov [y], 1 | mov ECX, [x] ; ae po-tso rfe 
exists (P1:EBX = 1 A P1:ECX = 0) po-tso ©. sy “fe 

“ob e: Wy = 

x86-TSO 

acyclic po-loc U com acyclic ghb-tso empty rmw N (fre ; coe) 

com = coU fr U rf com-tso = co U fr U rfe po-tso = (po \ W x R) Umfence 


implied = pon (W x R) N ((M x A) U (A x M)) ghb-tso = po-tso U com-tso U implied 


The relations rf, co, and fr model the communication of instructions via the 
shared memory (reading from a write, coherence, overwriting a read). Their 
restrictions rfe, coe, and fre denote (external) communication between instruc- 
tions from different threads. Relation po is the program order within the same 
thread and po-loc is its restriction to events addressing the same memory loca- 
tion. Edges of mfence relate events separated by a fence. Further relations are 
derived from these base relations. To belong to the TSO semantics of the 
program, an execution has to satisfy the constraints of the memory model: 
empty rmw N (fre;coe), which enforces atomicity of read-modify-write events, 
and the two acyclicity constraints. 
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DARTAGNAN encodes the semantics of the given program under the given 
memory model into an SMT formula. The problem is that each edge (a,b) that 
may be present in a relation r gives rise to a variable r(a,b). The goal of our 
relation analysis is to reduce the number of edges that need to be encoded. We 
illustrate this on the constraint acyclic ghb-tso. The graph next to the program 
shows the 14 (dotted and solid) edges which may contribute to the relation ghb- 
tso. Of those, only the 6 solid edges can occur in a cycle. The dotted edges 
can be dropped from the SMT encoding. Our relation analysis determines the 
solid edges—edges that may have an influence on a constraint of the memory 
model. Additionally, ghb-tso is a composition of various subrelations (e.g., po-tso 
or co U fr) that also require encoding into SMT. Relation analysis applies to 
subrelations as well. Applied to all constraints, it reduces the number of encoded 
edges for all (sub)relations from 221 to 58. 


2 Input, Functionality, and Implementation 


DARTAGNAN has the ambition of being widely applicable, from assembly over 
operating system code written in C/C++ to lock-free data structures. The tool 
accepts programs in PPC, x86, AArch64 assembly, and a subset of C11, all 
limited to the subsets supported by Herd’s .litmus format. It also reads our own 
.pts format with C11-like syntax [28]. We refer to global variables as memory 
locations and to local variables as registers. We support pointers, i.e., a register 
may hold the address of a location. Addresses and values are integers, and we 
allow the same arithmetic operations for addresses as for regular integer values. 
Different synchronization mechanisms are available, including variants of read- 
modify-write, various fences, and RCU instructions [26]. 

We support the assertion language of HERD. Assertions define inequalities 
over the values of registers and locations. They come with quantifiers over the 
reachable states that should satisfy the inequalities. 

We use the CAT language [11,12,16] to define memory models. A memory 
model consists of named relations between events that may occur in an execution. 
Whether or not an execution is valid is defined by constraints over these relations: 


(MM) ::= (const) | (rel) | (MM) A (MM) (r) ::= (b) | (name) | (r) U (r) | (r) \ (r) 


(const) ::= acyclic((r)) | irreflexive((r)) I(r ry ETE E dry* Lr) (r) 
| empty((r)) (b) ::= id | int | ext | po | fencerel( fence) 
(rel) ::= (name) := (r) | rmw | ctrl | data | addr | loc | rf | co. 


CAT has a rich relational language, and we only show an excerpt above. So- 
called base relations (b) model the control flow, data flow, and synchronization 
constraints. The language provides intuitive operators to derive further rela- 
tions. One may define relations recursively by referencing named relations. Their 
semantics is the least fixpoint. 


Relation Analysis for Compact SMT Encodings 359 


DARTAGNAN is invoked with two inputs: the program, annotated with an 
assertion over the final states, and the memory model. There are two optional 
parameters related to the verification. The SMT encoding technique for recursive 
relations is defined by mode chosen between knastertarski (default) and idl (see 
below). The parameter alias, chosen between none and andersen (default), defines 
whether to use an alias analysis for our relation analysis (cf. Sect. 3). 

Being a bounded model checker, DARTAGNAN computes an unrolled program 
with conditionals but no loops. It encodes this acyclic program together with the 
memory model into an SMT formula and passes it to the Z3 solver. The formula 
has the form Wprog A Wassert N Ymm, Where Wprog encodes the program, Wassert 
the assertion, and Ymm the memory model. We elaborate on the encoding of the 
program and the memory model. The assertion is already given as a formula. 

We model the heap by encoding a new memory location for each variable 
and a set of locations for each memory allocation of an array. Every location has 
an address encoded as an integer variable whose value is chosen by the solver. In 
an array, the locations are required to have consecutive addresses. Instances of 
instructions are modeled as events, most notably stores (to the shared memory) 
and loads (from the shared memory). 

We encode relations by associating pairs of events with Boolean variables. 
Whether the pair (e1,e2) is contained in relation r is indicated by the vari- 
able r(e1,e2). Encoding the relations rı N r2, rı Ure, r1 ; f2, r1 \ rg and r7! is 
straightforward [27]. For recursively defined and (reflexive and) transitive rela- 
tions, DARTAGNAN lets the user choose between two methods for computing 
fixed points by setting the appropriate parameter. The integer-difference logic 
(IDL) method encodes a Kleene iteration by means of integer variables (one for 
each pair of events) representing the step in which the pair was added to the 
relation [27]. The Knaster-Tarski encoding simply looks for a post fixpoint. We 
have shown in [28] that this is sufficient for reachability analysis. 


3 Relation Analysis 


To optimize the size of the encoding (and the solving times), we found it essential 
to reduce the domains of the relations. We determine for each relation a static 
over-approximation of the pairs of events that may be in this relation. Even more, 
we restrict the relation to the set of pairs that may influence a constraint of the 
given memory model. These restricted sets are the relation analysis information 
(of the program relative to the memory model). Technically, we compute, for 
each relation r, two sets of event pairs, M (r) and A(r). The former contains so- 
called may pairs, pairs of events that may be in relation r. This does not yet 
take into account whether the may pairs occur in some constraint of the memory 
model. The active pairs A(r) incorporate this information, and hence restrict the 
set of may pairs. As a consequence of the relation analysis, we only introduce 
Boolean variables r(e1, e2) for the pairs (e1,e2) € A(r) to the SMT encoding. 
The algorithm for constructing the may set and the active set is a fix- 
point computation. What is unconventional is that the two sets propagate their 
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information in different directions. For A(r), the computation proceeds from the 
constraints and propagates information down the syntax tree of the CAT mem- 
ory model. The sets M(r) are computed bottom-up the syntax tree. Interestingly, 
in our implementation, we do not compute the full fixpoint but let the top-down 
process trigger the required bottom-up computation. 

Both sets are computed as least solutions to a common system of inequalities. 
As we work over powerset lattices (relations are sets after all), the order of the 
system will be inclusion. We understand each set M(r) and A(r) as a variable, 
thereby identifying it with its least solution. To begin with, we give the definition 
for A(r). In the base case, we have a relation r that occurs in a constraint of the 
memory model. The inequality is defined based on the shape of the constraint: 


A(r) D M(r) (empty) Alr) D M(r) Nid (irrefl.) Alr) D M(r) A M(r*+)~* (acyclic). 


For the emptiness constraint, all pairs of events that may be contained in the 
relation are relevant. If the constraint requires irreflexivity, what matters are 
the pairs (e, e). If the constraint requires acyclicity, we concentrate on the pairs 
(€1,€2), where (e1,e2) may be in relation r and (e2,e1) may be in relation rt. 
Note how the definition of active pairs triggers the computation of may pairs. 

If the relation in the constraint is a composed one, the following inequalities 
propagate the information about the active pairs down the syntax tree of the 
CAT memory model: 


Alr) D A(N ifr=r' 

A(rı) D A(r) ifr=rnNr orr=rn \r2 
A(rı) D Alr) A M(r1) ifr=ry Ure orr=re\n 
A(n) 2 {x € M(rı) | z; M (r2) A A(r) # OF if r= r1; r2 

Alı) D {re M(n)| M(Ä); r; M(N ANZ ifr=rf orr=ñ. 


The definition maintains the invariant A(r) C M (r). If a pair (e1, e2) is relevant 
to relation r = rī', then (e2,e1) will be relevant to rı. We do not have to 
intersect A(r)~! with M(r)~! because A(r) C M (r) ensures A(r)~! C M(r)7?. 
We can avoid the intersection with the may pairs for the next case as well. There, 
A(r) C M(r) holds by the invariant and M(r) = M(n)NM (re) by definition (see 
below). For union and the other case of subtraction, the intersection with M (r1) 
is necessary. There are symmetric definitions for union and intersection for ro. 
For a relation rı that occurs in a relational composition r = r1;r2, the pairs 
(e1,e3) become relevant if they may be composed with a pair (e3,e2) in r2 to 
obtain a pair (e1, e2) relevant to r. Note that for r2 we again need the may pairs. 
The definition for r2 is similar. The definition for the (reflexive and) transitive 
closure follows the ideas for relational composition. 

The definition of the may sets follows the syntax of the CAT memory model 
bottom-up. With $ € {U,N, ;} and @ € {+,*,—1}, we have: 
M(n @re) 2 M(rn) @ M(re) M(r®) D M(r)® M(ri \r2) 2 M(r). 


Relation Analysis for Compact SMT Encodings 361 


T P | -e FMCAD-ARM 
FMCAD-TSO 


1 10 20 30 40 50 60 70 1 10 2 30 40 50 60 70 12 345678910 


Parker Peterson Dekker 


30 min 30 min f- J 30 min 30 min + 


Burns Bakery Lamport Szymanski 


Fig. 1. Impact of the unrolling bound (a-axis) on the verification time (y-axis). 


This simply executes the operator of the relation on the corresponding may sets. 
Subtraction (rı \ r2) is the exception, it is not sound to over-approximate ro. 

At the bottom level, the may sets are determined by the base relations. They 
depend on the shape of the relations and the positions of the events in the 
control flow. The relations loc, co and rf are concerned with memory accesses. 
What makes it difficult to approximate these relations is our support for pointers 
and pointer arithmetic. Without further information, we have to conservatively 
assume that a memory event may access any address. To improve the precision of 
the may sets for loc, co, and rf, our fixpoint computation incorporates a may-alias 
analysis. We use a control-flow insensitive Andersen-style analysis [17]. It incurs 
only a small overhead and produces a close over-approximation of the may sets. 
The analysis returns! a set of pairs of memory events PTS C (WUR) x (WUR) 
such that every pair of events outside PTS definitely accesses different addresses. 
Here, W are the store events in the program and R are the loads. Note that the 
analysis has to be control-flow insensitive as the given memory model may be 
very weak [10]. We have M (loc) > PTS. Similarly, M (co) and M (rf) are defined 
by PTS restricted to (W x W) and (W x R), respectively. 

We stress the importance of the alias analysis for our relation analysis: loc, 
co, and rf are frequently used as building blocks of composite relations. Excessive 
may sets will therefore negatively affect the over-approximations of virtually all 
relations in a memory model, and keep the overall encoding unnecessarily large. 


Illustration. We illustrate the relation analysis on the example from the intro- 
duction. Consider constraint acyclic ghb-tso. The computation of the active set 
for the relation ghb-tso triggers the calculation of the may set, following the 
inequality A(ghb-tso) D M(ghb-tso) N M(ghb-tsot)—!. The may set is the union 
of the may sets for the subrelations, shown by colored (dotted and solid) edges. 


1 This is a simplification, Andersen returns points-to sets, and we check by an inter- 
section PTS(r1) O PTS(r2) whether two registers may alias. 
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Fig. 2. Execution times (logarithmic scale) on LINUX kernel litmus tests: impact of 
alias analysis (left) and comparison against HERD (right). 


The intersection yields the edges that may lie on cycles of ghb-tso. They are 
drawn in solid. These solid edges in A(ghb-tso) are propagated down to the sub- 
relations. For example, A(po-tso) D A(ghb-tso)M M (po-tso) yields the solid black 
edges. 


4 Experiments 


We compare DARTAGNAN to CBMC [13] and Nipuuce [5,9], both model- 
specific tools, and to HERD [4,16] and the DARTAGNAN FMCAD-18 version 
[3,28] (without relation analysis), both taking CAT models as inputs. We also 
evaluate the impact of the alias analysis on the execution time. 


Benchmarks. For CBMC, NIDHUGG, and the FMCAD-18 DARTAGNAN, we 
evaluate the performance on 7 mutual exclusion benchmarks executed on TSO 
(all tools) and a subset of ARMv7 (only NIDHUGG and DARTAGNAN). The 
results on POWER are similar to those on ARM and thus omitted. We excluded 
HERD from this experiment since it did not scale even for small unrolling bounds 
[28]. We set a 5 min timeout for Parker, Dekker, and Peterson as this is sufficient 
to show the trends in the runtimes, and a 30min timeout for the remaining 
benchmarks. To compare against HERD, and to evaluate the impact of the alias 
analysis, we run 4751 LINUX kernel litmus tests (all tests from [1] without LINUX 
spinlocks). The tests contain kernel primitives, such as RCU, on the LINUX kernel 
model. We set a 30 min timeout. 


Evaluation. The times for CBMC, NipHuGG-ARM, and the FMCAD-2018 
version of DARTAGNAN grow exponentially for Parker (see Fig. 1). The growth 
in CBMC and FMCAD-2018 is due to the explosion of the encoding. For the 
latter, the solver runs out of memory with unrolling bounds 20 (TSO) and 10 
(ARM). For NipHuGc-ARM, the tool explores many unnecessary executions. 
The verification times for NIDHUGG-TSO and the current version of DARTAG- 
NAN grow linearly. The latter is due to the relation analysis. For Peterson, the 
results are similar except for CBMC, which matches DARTAGNAN’s performance. 

For Dekker, NIDHUGG outperforms both CBMC and DARTAGNAN. This is 
because the number of executions grows slowly compared to the explosion of the 
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number of instructions. The executions in both memory models coincide, mak- 
ing the performance on ARM comparable to that on TSO for NipHUGG. The 
difference is due to the optimal exploration in TSO, but not in ARM. Relation 
analysis has some impact on the performance (see FMCAD-2018 vs. DARTAG- 
NAN), but the encoding size still grows faster than the number of executions. 

The benchmarks Burns, Bakery, and Lamport demonstrate the opposite 
trend: the number of executions grows much faster than the size of the encoding. 
Here, CBMC and DARTAGNAN outperform NIDHUGG. Notice that for Burns, 
NIDHUGG performs better on ARM than on TSO with unrolling bound 5. 
This is counter-intuitive since one expects more executions on ARM. Although 
the number of executions coincide, the exploration time is higher on TSO due 
to a different search algorithm. For Szymanski, similar results hold except for 
DARTAGNAN-ARM where the encoding grows exponentially. 

Figure 2 (left) shows the verification times for the current version of DARTAG- 
NAN with and without alias analysis. The alias analysis results in a speed-up of 
more than two orders of magnitude in benchmarks with several threads accessing 
up to 18 locations. Figure2 (right) compares the performance of DARTAGNAN 
against HERD. We used the Knaster-Tarski encoding and alias analysis since 
they yield the best performance. HERD outperforms DARTAGNAN on small test 
instances (less than 1s execution time). This is due to the JVM startup time and 
the preprocessing costs of DARTAGNAN. However, on large benchmarks, HERD 
times out while DARTAGNAN takes less than 10s. 
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Abstract. Designing protocols for multi-agent interaction that achieve 
the desired behavior is a challenging and error-prone process. The stan- 
dard practice is to manually develop proofs of protocol correctness that 
rely on human intuition and require significant effort to develop. Even 
then, proofs can have mistakes that may go unnoticed after peer review, 
modeling and simulation, and testing. The use of formal methods can 
reduce the potential for such errors. In this paper, we discuss our expe- 
rience applying model checking to a previously published multi-agent 
protocol for unmanned air vehicles. The original publication provides a 
compelling proof of correctness, along with extensive simulation results 
to support it. However, analysis through model checking found an error 
in one of the proof’s main lemmas. In this paper, we start by provid- 
ing an overview of the protocol and its original “proof” of correctness, 
which represents the standard practice in multi-agent protocol design. 
We then describe how we modeled the protocol for a three-vehicle sys- 
tem in a model checker, the counterexample it returned, and the insight 
this counterexample provided. We also discuss benefits, limitations, and 
lessons learned from this exercise, as well as what future efforts would be 
needed to fully verify the protocol for an arbitrary number of vehicles. 


Keywords: Multi-agent systems - Distributed systems - Autonomy - 
Model checking 


1 Introduction 


Many robotics applications require multi-agent interaction. However, designing 
protocols for multi-agent interaction that achieve the desired behavior can be 
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challenging. The design process is often manual, i.e. performed by humans, and 
generally involves creating mathematical models of possible agent behaviors and 
candidate protocols, then manually developing a proof that the candidate proto- 
cols are correct with respect to the desired behavior. However, human-generated 
proofs can have mistakes that may go unnoticed even after peer review, modeling 
and simulation, and testing of the resulting system. 

Formal methods have the potential to reduce such errors. However, while the 
use of formal methods in multi-agent system design is increasing [2,6,8,11], it 
is our experience that manual approaches are still the norm. Here, we hope to 
motivate the use of formal methods for multi-agent system design by demon- 
strating their value in a case study involving a manually designed decentralized 
protocol for dividing surveillance of a perimeter across multiple unmanned aerial 
vehicles (UAVs). This protocol, called the Decentralized Perimeter Surveillance 
System (DPSS), was previously published in 2008 [10], has received close to 200 
citations to date, and provides a compelling “proof” of correctness backed by 
extensive simulation results. 

We start in Sect. 2 by giving an overview of DPSS, the convergence bounds 
that comprise part of its specification, and the original “proof” of correctness. 
In Sect. 3, we give an overview of the three-UAV DPSS model we developed in 
the Assume Guarantee REasoning Environment (AGREE) model checker [3]. In 
Sect. 4, we present the analysis results returned by AGREE, including a coun- 
terexample to one of the convergence bounds. Section 5 concludes with a dis- 
cussion of benefits, challenges, and limitations of our modeling process and how 
to help overcome them, and what future work would be required to modify and 
fully verify DPSS for an arbitrary number of UAVs. 


2 Decentralized Perimeter Surveillance System (DPSS) 


UAVs can be used to perform continual, repeated surveillance of a large perime- 
ter. In such cases, more frequent coverage of points along the perimeter can be 
achieved by evenly dividing surveillance of it across multiple UAVs. However, 
coordinating this division is challenging in practice for several reasons. First, 
the exact location and length of the perimeter may not be known a priori, and 
it may change over time, as in a growing forest fire or oil spill. Second, UAVs 
might go offline and come back online, e.g. for refueling or repairs. Third, inter- 
UAV communication is unreliable, so it is not always possible to immediately 
communicate local information about perimeter or UAV changes. However, such 
information is needed to maintain an even division of the perimeter as changes 
occur. DPSS provides a method to solve this problem with minimal inter-UAV 
communication for perimeters that are isomorphic to a line segment. 

Let the perimeter start as a line segment along the x-axis with its left end- 
point at x = 0 and its right at x = P. Let N be the number of UAVs in the 
system or on the “team,” indexed from left to right as 1,...,N. Divide the 
perimeter into segments of length P/N, one per UAV. Then the optimal con- 
figuration of DPSS as depicted in Fig.1 is defined as follows (see Ref. [10] for 
discussion of why this definition is desirable). 
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Definition 1. Consider two sets of perimeter locations: (1) |i + $(—1)'|P/N 
and (2) |i—4(—1)'| P/N, where |:| returns the largest integer less than or equal 
to its argument. The optimal configuration is realized when UAVs synchronously 
oscillate between these two sets of locations, each moving at constant speed V. 


r=0 2 4 2 4 2 4 
a a E $ + + > = 
1 P/N 3 r=P ? 3 1 3 


Fig. 1. Optimal DPSS configuration, in which UAVs are evenly spaced along the 
perimeter and synchronously oscillate between segment boundaries. 


The goal of DPSS is to achieve the optimal configuration in the steady state, 
i.e. when the perimeter and involved UAVs remain constant. The DPSS protocol 
itself is relatively simple. Each UAV i stores a vector & = (Pr, Pr, Nr, Np)” 
of coordination variables that capture its beliefs (which may be incorrect) about 
perimeter length Pr, and Py, and number of UAVs Np, and Nz, to its right 
and left. When neighboring UAVs meet, “left” UAV i learns updated values for 
its “right” variables Pp, = Pr,,, and Np, = Nr,,, +1 from “right” UAV i+1, 
and likewise UAV i+ 1 updates its “left? variables Pr... = Pz, and N}, = 
Nz, + 1. While values for these variables may still be incorrect, the two UAVs 
will at least have matching coordination variables and thus a consistent estimate 
of their shared segment boundary. The two UAVs then “escort” each other to 
their estimated shared segment boundary, then split apart to surveil their own 
segment. Note that UAVs only change direction when they reach a perimeter 
endpoint or when starting or stopping an escort, which means a UAV will travel 
outside its segment unless another UAV arrives at the segment boundary at the 
same time (or the end of the segment is a perimeter endpoint). 

Eventually, leftmost UAV 1 will discover the actual left perimeter endpoint, 
accurately set Nz, = 0 and Pr, = 0, then turn around and update Pz, con- 
tinuously as it moves. A similar situation holds for rightmost UAV n. Accurate 
information will be passed along to other UAVs as they meet, and eventually all 
UAVs will have correct coordination variables and segment boundary estimates. 
Since UAVs also escort each other to shared segment boundaries whenever they 
meet, eventually the system reaches the optimal configuration, in which UAVs 
oscillate between their true shared segment boundaries. 

An important question is how long it takes DPSS to converge to the optimal 
configuration. Each time the perimeter or number of UAVs changes, it is as if the 
system is reinitialized; UAVs no longer have correct coordination variables and 
so the system is no longer converged. However, if DPSS is able to re-converge 
relatively quickly, it will often be in its converged state. 

Ref. [10] claims that DPSS converges within 5T, where T = P/V is the 
time it would take a single UAV to traverse the entire perimeter if there were no 
other UAVs in the system. It describes DPSS as two algorithms: Algorithm A, in 
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which UAVs start with correct coordination variables, and Algorithm B, in which 
they do not. The proof strategy is then to argue that Algorithm A converges in 
2T (Theorem 1) and Algorithm B achieves correct coordination variables in 3T 
(Lemma 1)!. At that point, Algorithm B converts to Algorithm A, so the total 
convergence time is 2T + 3T = 5T (Theorem 2)?. 


t=0 tar tær taar 
N, N NN 
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Fig. 2. Claimed worst-case coordination variable convergence for Algorithm B. 


Informally, the original argument for Lemma 1 is that information takes 
time T to travel along the perimeter. The worst case occurs when all UAVs 
start near one end of the perimeter, e.g. the left endpoint, so that the rightmost 
UAV N reaches the right endpoint around time T. UAV N then turns around 
and through a fast series of meetings, correct “right” coordination variables 
are propagated to the other UAVs, all of which then start moving left. Due to 
incorrect “left” coordination variables, UAV N — 1 and UAV N might think 
their shared segment boundary is infinitesimally close to the left endpoint. The 
UAVs travel left until they are almost at the left perimeter endpoint around 
time 2T. However, since UAV N thinks its segment boundary is near the left 
endpoint, it ends its escort and goes right without learning the true location of 
the left perimeter endpoint. Leftmost UAV 1 learns the true location of the left 
perimeter endpoint and this information will be passed to the other UAVs as 
they meet, but the information will have to travel the perimeter once again to 
reach the rightmost UAV N around time 3T. This situation is depicted in Fig. 2. 

Through model checking, we were able to find a counterexample to this 
claimed bound, which will be presented in Sect.4. But first, we overview the 
model used for analysis through model checking. 


3 Formal Models 


We briefly overview the formal models developed in AGREE for a three-UAV 
version of DPSS as described by Algorithm B. Models for Algorithm A and 


1 We label this Lemma 1 for convenience; it is unlabeled in [10]. 
2 A version of the original proof is on GitHub [1] in file dpssOriginalProof.pdf. 
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Algorithm B along with a more detailed description of the Algorithm B model 
are available on GitHub [1]°. 

AGREE is an infinite-state model checker capable of analyzing systems with 
real-valued variables, as is the case with DPSS. AGREE uses assume/guarantee 
reasoning to verify properties of architectures modeled as a top-level system with 
multiple lower-level components, each having a formally specified assume/guar- 
antee contract. Each contract consists of a set of assumptions on the inputs 
and guarantees on the outputs, where inputs and outputs can be reals, integers, 
or booleans. System assumptions and component assume/guarantee contracts 
are assumed to be true. AGREE then attempts to verify that (a) component 
assumptions hold given system assumptions, and (b) system guarantees hold 
given component guarantees. AGREE poses this verification problem as a satis- 
fiability modulo theory (SMT) problem [4] and uses a k-induction model check- 
ing approach [7] to search for counterexamples that violate system-level guar- 
antees given system-level assumptions and component-level assume/guarantee 
contracts. The language used by AGREE is an “annex” to the Architecture 
Analysis and Design Language (AADL) [5]. 

AGREE’s ability to analyze systems modeled as a top-level system with 
multiple lower-level components provides a natural fit for DPSS. The three- 
UAV AGREE DPSS model consists of a single top-level system model, which 
we call the “System,” and a component-level UAV model that is instantiated 
three times, which we call the “UAV(s).” The System essentially coordinates a 
discrete event simulation of the UAVs as they execute the DPSS protocol, where 
events include a UAV reaching a perimeter endpoint or two UAVs starting or 
stopping an escort. In the initial state, the System sets valid ranges for each 
UAV’s initial position through assumptions that constrain the UAVs to be ini- 
tialized between the perimeter endpoints and ordered by ID number from left to 
right. System assumptions also constrain UAV initial directions to be either left 
or right (though a UAV might have to immediately change this value, e.g., if it 
is initialized at the left endpoint headed left). These values become inputs to the 
UAVs. The System determines values for other UAV inputs, including whether 
a UAV is co-located with its right or left neighbor and the true values for the 
left and right perimeter endpoints. Note the true perimeter endpoints are only 
used by the UAVs to check whether they have reached the end of the perime- 
ter, not to calculate boundary segment endpoints. The System also establishes 
data ports between UAVs, so that each UAV can receive updated coordination 
variable values from its left or right neighbor as inputs and use them (but only 
if they are co-located). 

The last System output that serves as a UAV input is the position of the 
UAV. At initialization and after each event, the System uses the globally known 
constant UAV speed V and other information from each UAV to determine the 
amount of time ôt until the next event, and then it updates the position of each 


3 AADL projects are in AADL-_sandbox_projects. Algorithm A and B models for 
three UAVs are in projects DPSS-3-AlgA-for-paper and DPSS-3-AlgB-for-paper. A 
description of the Algorithm B model is in file modelAlgorithmB. pdf. 
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UAV. Determining the time of the next event requires knowing the direction and 
next anticipated “goal” location of each UAV, e.g. estimated perimeter endpoint 
or shared segment boundary. Each UAV outputs these values, which become 
inputs to the System. Each UAV also outputs its coordination variables Pr,, Pr, 
Npr,, and Nz,, which become System inputs that are used in System guarantees 
that formalize Theorem 1, Lemma 1, and Theorem 2 of Sect.2. Note that we 
bound integers Nr, and Nz, because in order to calculate estimated boundary 
segments, which requires dividing perimeter length by the number of UAVs, we 
must implement a lookup table that copies the values of Nr, and Nz, to real- 
valued versions of these variables. This is due to an interaction between AGREE 
and the Z3 SMT solver [4] used by AGREE. If we directly cast Nr, and Nz, 
to real values in AGREE, they are encoded in Z3 using the to-real function. 
Perimeter values Pr, and Pr, are directly declared as reals. However, Z3 views 
integers converted by the to_real function as constrained to have integer values, 
so it cannot use the specialized solver for reals that is able to analyze this model. 


4 Formal Analysis Results 


In this section, we discuss the analysis results provided by AGREE for Algo- 
rithm A and Algorithm B, though we focus on Algorithm B. 


Algorithm A: Using AGREE configured to utilize the JKind k-induction model 
checker [7] and the Z3 SMT solver, we have proven Theorem 1, that Algorithm A 
converges within 2T, for N = 1, 2, 3, 4, 5, and 6 UAVs. Computation time pre- 
vented us from analyzing more than six UAVs. For reference, N = 1 through 
N = 4ran in under 10 min each on a laptop with two cores and 8 GB RAM. The 
same laptop analyzed N = 5 overnight. For N = 6, the analysis took approxi- 
mately twenty days on a computer with 40 cores and 128 GB memory. 


Algorithm B: We were able to prove Theorem 2, that DPSS converges within 
5T, for N = 1, 2, and 3 UAVs and with each UAV’s coordination variables Np, 
and Nz, bounded between 0 and 20. In fact, we found the convergence time to 
be within (4+ 4T). However, AGREE produced a counterexample to Lemma 1, 
that every UAV obtains correct coordination variables within 3T, for N = 3. In 
fact, we incrementally increased this bound and found counterexamples up to 
(3+ 5)T but that convergence is guaranteed in (3 + 3)T. 

One of the shorter counterexamples provided by AGREE shows the UAVs 
obtaining correct coordination variables in 3.0129T. Full details are available on 
GitHub [1],4 but we outline the steps in Fig.3. In this counterexample, UAV 1 
starts very close to the left perimeter heading right, and UAVs 2 and 3 start 
in the middle of segment 3 headed left. UAVs 1 and 2 meet near the middle 
of the perimeter and head left toward what they believe to be their shared 
segment boundary. This is very close to the left perimeter endpoint because, 
due to initial conditions, they believe the left perimeter endpoint to be much 


4 A spreadsheet with counterexample values for all model variables is located under 
AADL _sandbox_projects/DPSS-3-AlgB-for-paper /results_20180815_eispi. 


372 J. A. Davis et al. 


farther away than it actually is. Then they split, and UAV 1 learns where the 
left perimeter endpoint actually is, but UAV 2 does not. UAV 2 heads right and 
meets UAV 3 shortly afterward, and they move to what they believe to be their 
shared segment boundary, which is likewise very close to the right perimeter 
endpoint. Then they split, and UAV 3 learns where the right perimeter endpoint 
is, but UAV 2 does not. UAV 2 heads left, meets UAV 1 shortly after, and 
learns correct “left” coordination variables. However, UAV 2 still believes the 
right perimeter endpoint to be farther away than it actually is, so UAV 1 and 2 
estimate their shared segment boundary to be near the middle of the perimeter. 
They then head toward this point and split apart, with UAV 1 headed left and 
still not having correct “right” coordination variables. UAV 2 and 3 then meet, 
exchange information, and now both have correct coordination variables. They 
go to their actual shared boundary, split apart, and UAV 2 heads left toward 
UAV 1. UAV 1 and 2 then meet on segment 1, exchange information, and now 
all UAVs have correct coordination variables. 

The counterexample reveals a key intuition that was missing in Lemma 1. 
The original argument did not fully consider the effects of initial conditions and 
so only considered a case in which UAVs came close to one end of the perimeter 
without actually reaching it. The counterexample shows it can happen at both 
ends if initial conditions cause the UAVs to believe the perimeter endpoints to 
be farther away than they actually are. This could happen if the perimeter were 
to quickly shrink, causing the system to essentially “reinitialize” with incorrect 
coordination variables. 
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Fig. 3. Counterexample to Lemma 1. Dots to the left of a UAV number indicate it has 
correct “left” variables, and likewise for the right. 


Analysis for three UAVs for Algorithm B completed in 18 days on a machine 
with 256 GB RAM and 80 cores. 
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5 Discussion and Conclusions 


Formal modeling and analysis through AGREE had many benefits. First, it 
allowed us to analyze DPSS, a decentralized protocol for distributing a surveil- 
lance task across multiple UAVs. Though the original publication on DPSS pro- 
vided a convincing human-generated proof and simulation results to support 
claims about its convergence bounds, analysis revealed that one of the key lem- 
mas was incorrect. Furthermore, the counterexample returned by AGREE pro- 
vided insight into why it was incorrect. Second, formal modeling in and of itself 
allowed us to find what were essentially technical typos in the original paper. For 
example, the formula for dividing the perimeter across UAVs only accounted for 
changes in estimates of the right perimeter endpoint and not the left, so we cor- 
rected the formula for our model. We also discovered that certain key aspects of 
the protocol were underspecified. In particular, it is unclear what should happen 
if more than two UAVs meet at the same time. Analysis showed this occurring 
for as little as three UAVs in Algorithm B, and simulations in the original paper 
showed this happening frequently, but this behavior was not explicitly described. 
Here, we decided that if all three UAVs meet to the left of UAV 3’s estimated 
segment, UAV 3 immediately heads right and the other two follow the normal 
protocol to escort each other to their shared border. Otherwise, the UAVs all 
travel left together to the boundary between segments 2 and 3, then UAV 3 
breaks off and heads right while the other two follow the normal protocol. 

This brings us to a discussion of challenges and limitations. First, in terms 
of more than two UAVs meeting at a time, simulations in the original paper 
implement a more complex behavior in which UAVs head to the closest shared 
boundary and then split apart into smaller and smaller groups until reaching the 
standard case of two co-located UAVs. This behavior requires a more complex 
AGREE model that can track “cliques” of more than two UAVs, and it is difficult 
to validate the model due to long analysis run times. Second, we noted in Sect. 4 
that in our model, UAV coordination variables Nr, and Nz, have an upper 
bound of 20. In fact, with an earlier upper bound of 3, we found the bound for 
Lemma 1 to be (3 + aT and did not consider that it would depend on upper 
bounds for Np, and Nz,. We therefore cannot conclude that even (3 + 3)T is 
the convergence time for Lemma 1. Third and related to the last point, model 
checking with AGREE can only handle up to three UAVs for Algorithm B. Due 
to these limitations, we cannot say for sure what the upper bound for DPSS 
actually is, even if we believe it to be 57. If it is higher, then it takes DPSS 
longer to converge, meaning it can handle less frequent changes than originally 
believed. We are therefore attempting to transition to theorem provers such as 
ACL2 [9] and PVS [12] to develop a proof of convergence bounds for an arbitrary 
number of UAVs, upper bound on Nr, and Nz,, and perimeter length (which 
was set to a fixed size to make the model small enough to analyze). 

In terms of recommendations and lessons learned, it was immensely useful to 
work with the author of DPSS to formalize our model. Multi-agent protocols like 
DPSS are inherently complex, and it is not surprising that the original paper 
contained some typos, underspecifications, and errors. In fact, the original paper 
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explains DPSS quite well and is mostly correct, but it is still challenging for 
formal methods experts to understand complex systems from other disciplines, 
so access to subject matter experts can greatly speed up formalization. 


Acknowledgment. We thank John Backes for his guidance on efficiently modeling 
DPSS in AGREE and Aaron Fifarek for running some of the longer AGREE analyses. 
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Abstract. NUXMV is a well-known symbolic model checker, which implements 
various state-of-the-art algorithms for the analysis of finite- and infinite-state tran- 
sition systems and temporal logics. In this paper, we present a new version that 
supports timed systems and logics over continuous super-dense semantics. The 
system specification was extended with clocks to constrain the timed evolution. 
The support for temporal properties has been expanded to include MTLo,oo for- 
mulas with parametric intervals. The analysis is performed via a reduction to 
verification problems in the discrete-time case. The internal representation of 
traces has been extended to go beyond the lasso-shaped form, to take into account 
the possible divergence of clocks. We evaluated the new features by comparing 
NUXMV with other verification tools for timed automata and MTLo, œ, consid- 
ering different benchmarks from the literature. The results show that NUXMV is 
competitive with and in many cases performs better than state-of-the-art tools, 
especially on validity problems for MTLo,.o.- 


1 Introduction 


NUXMV [1] is a symbolic model checker for the analysis of synchronous finite- and 
infinite-state transition systems. For the finite-state case, NUXMV features strong ver- 
ification engines based on state-of-the-art SAT-based algorithms. For the infinite-state 
case, NUXMV features SMT-based verification techniques, implemented through a tight 
integration with the MATHSATS5S solver [2]. NUXMvV has taken part to recent editions 
of the hardware model checking competition, where it has shown to be very compet- 
itive with the state-of-the-art. NUXMV also compares well with other model checkers 
for infinite-state systems. Moreover, it has been successfully used in several applica- 
tion domains both in research and industrial settings. It is currently the core verification 
engine for many other tools (also industrial ones) for requirements analysis, contract 
based design, model checking of hybrid systems, safety assessment, and software model 
checking. 

In this paper, we put emphasis on the novel extensions to NUXMV to support timed 
synchronous transition systems, which extend symbolically-represented infinite-state 
transition systems with clocks. The main novelties of this new version are the follow- 
ing. The NUXMV input language was extended to enable the description of symbolic 
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Fig. 1. The high level architecture of NUXMV. 


synchronous timed transition systems with super-dense time semantics (where signals 
can have a sequence of values at any real time t). The support for temporal proper- 
ties has been expanded to include MTLo,.. formulas with parametric intervals [3,4]. 
Therefore, NUXMV now supports model checking of invariant, LTL and MTLo, s prop- 
erties over (symbolic) timed transition systems, as well as validity/satisfiability check- 
ing of LTL and MTLo,.. formulas. This is done via a correct and complete reduction 
to verification problems in the discrete-time case (thus allowing for the use of mature 
and efficient verification engines). In order to represent and find infinite traces where 
clocks may diverge, we extended the representation for lasso-shape traces (over discrete 
semantics) and we modified the bounded model checking algorithm to properly encode 
timed traces. We remark that, NUXMV is more expressive than timed automata, since 
the native management of time is added on top of an infinite state transition system. 
This makes it straightforward to encode stopwatches and comparison between clocks. 
We carried out an experimental evaluation comparing NUXMV with other state-of-the- 
art verification tools for timed automata, considering different benchmarks taken from 
competitor tools distributions. 


2 Software Architecture 


The high level architecture of NUXMV is depicted in Fig. 1. For symbolic transition sys- 
tems NUXMV behaves like the previous version of the system [1], thus allowing for full 
backward compatibility (apart from some new reserved keywords). It provides the user 
with all the basic model checking algorithms for finite domains both using BDDs (using 
CUDD [5]) and SAT (e.g. MINISAT [6]). It supports various SMT-based model check- 
ing algorithms (implemented through a tight integration with the MATHSATS solver 
[2]) for the analysis of finite and infinite state systems (e.g. IC3 [7-9], k-liveness [10], 
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liveness to safety [11]). We refer the reader to [1] for a thorough discussion of these 
consolidated functionalities for the discrete-time setting. 


1 @TIMEDOMAIN continuous — annotation to specify the time semantics, in this case dense time 
2 

3 MODULE main 

4 FROZENVAR p: real; INIT p> 0 — parameter 

5 VAR i: real; — input of the sensor 

6 VAR s: Sensor(i); 

7 VAR m: Monitor(s.o,p); 

8 


9 LTLSPEC G ( s.fault —> F [0,p] m.alarm ) — any fault is detected in p timed units 


11 MODULE Sensor(i) 
12 VAR o: real; 
13 VAR fault: boolean; 


14 TRANS ! fault —> next(o) = i — if not faulty, the sensor provides in output directly the input 
15 TRANS fault —> next(o) = o — if faulty, the sensor output is stuck at the last value 

16 TRANS fault —> next(fault) — the fault is permanent 

17 

18 MODULE Monitor (i,p) 


19 VAR previous-value: real; 

20 VAR c: clock; 

21 VAR alarm: boolean; 

22 INIT c=0 & previous_value = i & !alarm 
23 INVAR TRUE —> c <= p 

24 TRANS time <=p | time >= p 


25 TRANS (c = p & next(c) = 0 & next(previous-value) = i) | — the monitor reads the sensor every p time units 
26 (c <=p & next(c) = c & next(previous-value) = previous_value) 
27 TRANS next(alarm) <> (alarm | i=previous-value) — alarm raised when the same value read twice consecutively 


Fig. 2. A simple TIMED-NUXMV program. 


To support the specification and model checking of invariant, LTL and MTLo,.. 
properties for timed transitions systems, and for the validity checking of properties over 
dense time semantics, NUXMV has been extended w.r.t. [1] as discussed here after. 


— We extended the parser to allow the user to choose the time semantics to use for 
the read model. Depending on the time model some parse constructs and checks are 
enabled and/or disabled. For instance, variables of type clock and MTLo,.. proper- 
ties are only allowed if the dense time semantics has been specified. By default the 
system uses the discrete time semantics of the original NUXMV. Notice also that, 
depending on the specified semantics, the commands available to the user change to 
allow only the analyses supported for the chosen semantics. 

— We extended the parser to support the specification of symbolic timed automata 
(definition of clock variables, specification of urgent transitions and state invariants, 
etc.). Moreover, we extended the parser to allow for the specification of MTLo,., 
properties, and we extended the LTL bounded operators not only to contain con- 
stants, but also complex expressions over clock variables. See Fig. 2 for a simple 
example showing some of the new language constructs. 

— We extended the symbol table to support the specification of clock variables, and we 
extended the type checker to properly handle the new defined variables, expression 
types and language constructs. 

— We added new modules for the encoding of the symbolic timed automata into equiv- 
alent transition systems to verify with the existing algorithms of NUXMV. 

— We extended the traces for NUXMV to support timed traces (lasso-shaped traces 
where some clock variables may diverge). 
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— We modified the encoding for the loops in the bounded model checking algorithms 
to take into account that traces may contain diverging variables to allow for the 
verification and validation of LTL and MTLo oo properties. 


For portability, NUXMV has been developed mainly in standard C with some new 
parts in standard C++. It compiles and executes on Linux, MS Windows, and MacOS. 


3 Language Extensions 


Timed Transition Systems. Discrete-time transition systems are described in NUXMV 
by a set V of variables, an initial condition J(V), a transition condition T(V, V’) and 
an invariant condition Z(V). Variables are introduced with the keyword VAR and can 
have type Boolean, scalar, integer, real or array. The initial and the invariant conditions 
are introduced with the keyword INIT and INVAR and are expressions over the vari- 
ables in V. The transition condition is introduced with TRANS and is an expression over 
variables in V and V’, where for each variable v in V, V’ contains the “next” version 
denoted in the language by next(v). Expressions may use standard symbols in the the- 
ory associated to the variable types and user-defined rigid functions that are declared 
with the keyword FUN. 

The input language of NUXMV has been extended to allow the specification of 
timed transition systems (TTS), which are enabled by the annotation @TIME_DOMAIN 
continuous at the beginning of a model description. 

Besides the standard types, in the timed case, state variables can be declared of type 
clock. All variables of type different from clock are discrete variables. 

The language provides a built-in clock variable, accessible through the reserved 
keyword time. It represents the amount of time elapsed from the initial state until now. 
time is initialized to 0 and its value does not change in discrete transitions. While all 
other clock variables can be used in any expression in the model definition, time can 
be used only in comparison with constants. 

Initial, transition, and invariant conditions are specified in NUXMV with the key- 
words INIT, TRANS, and INVAR, as in the discrete case. In particular, TRANS allows to 
specify “arbitrary” clock resets. Like all other NUXMV state variables, if a clock is not 
constrained during a discrete transition, its next value is chosen non-deterministically. 

Clock variables can be used in INVAR only in the form y — ¢, where ọ is a for- 
mula built using only the discrete variables and ¢ is convex over the clock variables. 
This closely maps the concept of location invariant described for timed automata: all 
locations satisfying y have invariant ¢. 

An additional constraint, not allowed in the discrete-time case, is introduced with 
the keyword URGENT followed by a predicate over the discrete variables, which allows 
to specify a set of locations in which time cannot elapse. 


Comparison with Timed Automata. Timed automata can be represented by TTSs by 
simply introducing a variable representing the locations of the automaton. Note that, 
in TTS, it is possible to express any kind of constraint over clock variables in discrete 
transitions, while in timed automata it is only possible to reset them to 0 in transi- 
tions or compare them to constants in guards. Moreover, the discrete variables of a 
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timed automaton always have finite domain, while in TTSs, also the discrete variables 
might have an infinite domain. This additional expressiveness allows to describe more 
complex behaviors (e.g. it is straightforward to encode stopwatches and comparison 
between clocks) losing the decidability of the model checking problem. 


Specifications. NUXMV’s support for LTL has been extended to allow for the use of 
MTLo,.0. operators [12] and other operators such as event-freezing functions [13] and 
dense version of LTL x and Y operators. MTLo,,. bounded operators extend the LTL 
ones of NUXMV to allow for bounds either of the form [c,co), where c is a constant 
greater or equal to 0, e.g. F[0,+00) p, or generic expressions over parametric/frozen 
variables: e.g. F [0, 3+v] y where v is a frozen variable. 

In timed setting, next and previous operators come in two possible versions. The 
standard LTL operators x and Y require to hold, respectively after and before, a dis- 
crete transition. Dually, x~ and y~ have been introduced to allow to predicate about the 
evolution over time of the system. They are always FALSE in discrete steps and hold in 
time elapses if the argument holds in the open interval immediately after/before (resp.) 
the current step. The disjunction x(y) V x~(y) allows to check if the argument vy holds 
after the current state without distinction between time or discrete evolution. 

The event-freezing operators at next and at last, written @F~ and @O~, are binary 
operators allowed in LTL specifications. The left-hand side is a term, while the right- 
hand side is a temporal formula. They return the value of the term respectively at the 
next and at the last point in time in which the formula is true. If the formula will [has] 
never happen [happened] the operator evaluates to a default value. 

time_until and time_since are two additional unary operators that can be used 
in LTL specifications of timed models. Their argument must be a Boolean predicate 
over current and next variables. time_until(y) evaluates to the amount of time elapse 
required to reach the next state in which vy holds, while time_since(y) evaluates to 
the amount of time elapsed from the last state in which y held. As for the @F~ and eo~ 
operators if no such state exists they are assigned to a default value. 


4 Extending Traces 


Timed Traces. The semantics of NUXMV has been extended to take into account the 
timing aspects in case of super-dense time. While in the discrete time case, the exe- 
cution trace is given by a sequence of states connected by discrete transitions (i.e., 
satisfying the transition condition), in the super-dense time case the execution trace is 
such that every pair of consecutive states is a discrete or a timed transition. As in the 
discrete case, discrete transitions are pair of states satisfying the transition condition. 
As in timed automata, in a timed transition time elapses for a certain amount (referred 
to as delta_time), clocks increase of the same amount, while discrete variables do not 
change. 


Lasso-Shaped Traces with Diverging Variables. Traditionally, the only infinite paths 
supported by NUXMV have been those in lasso shape, i.e. those traces which can be 
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represented by a finite prefix so, s1,..., 5; (called the stem) followed by a finite suffix 
Sl41,--++,Sk = Sı (called the loop), which can be repeated infinitely many times. While 
this representation is sufficient for finite-state systems (because in a finite-state setting if 
a system does not satisfy an LTL property, then a lasso-shaped counter-example trace 
is guaranteed to exist), this is an important limitation in an infinite-state context, in 
which lasso-shaped counter-examples are not guaranteed to exist. (As a simple example, 
consider a system M := ({x}, (x = 0), (a = x + 1)) in which x € Z. Then M [- 
GF (xz = 0), but clearly M has no lasso-shaped trace). In fact, this is especially relevant 
for timed transition systems, which, by the presence of the always-diverging variable 
time, admit no lasso-shaped trace. 

In order to overcome this limitation, we introduce new kinds of infinite traces, which 
we call lasso-shape traces with diverging variables (to allow also for representing traces 
with variables whose value might be diverging). We modified the bounded model check- 
ing algorithms to leverage on this new representation to then extend the capabilities to 
find witnesses for a given property. This representation significantly extends the capa- 
bilities of NUXMV to find witnesses for violated LTL and MTL properties on timed 
transition systems (see experimental evaluation). 


Definition 1. Let 7 := so, 51,..., Sı... be an infinite trace of a system M over vari- 
ables V. We say that 7 is a lasso-shaped trace with diverging variables iff there exist 
indexes 0 < l < k, a partitioning of V into sets X and Y (V = X W Y) and an 
expression fy(V) over V for every variable y € Y such that, for every i > k, 


saka S14((i—1) mod (k-1)) (V) ifv € X (like in lasso-shaped traces); 
Swen er cee ifv € Y (as function of previous state). 


Intuitively, the idea of lasso-shaped traces with diverging variables is to provide a 
finite representation for infinite traces that is more general then simple lasso-shaped 
ones, and which allows to capture more interesting behaviors of timed transition sys- 
tems. 


Example 1. Consider the system M := ({y,b}, =b A y = 0,(0' = =b) A (b > y = 
y+1)^A (=b = y' = y)). Then one lasso-shaped trace for M is given by: 7 := 80, $1, S2, 
where so := {b> L, y > 0}, sı := {bb T,y => 0}, and s2 := {b > Ly 1}; 
the trace is lasso-shaped with diverging variables considering Y := {y}; the loop-back 
at index 0, and f,(b,y) :=b0? y +1 : y. 


Extended BMC for Traces with Divergent Clocks. The definition above requires the 
existence of the functions fy for computing the updates of diverging variables. In case 
y is a clock variable, we can define a region [¢,] in which y can diverge G.e., fy = y+ô, 
where ô is the delta time variable). 

In order to capture lasso-shaped traces with diverging variables, we can modify 
the BMC encoding as follows. Let Va (Asexwy (V = v*) A lelg) be the formula 
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representing the BMC encoding of [14] at depth k with all possible loop-backs 0 < | < 
k for a given formula vy. The encoding is extended as follows: 


k 


n 
Vit A@=2 ya Na = v Aldus) | adele 
i=l 


1=0 rex yey 


The correctness of the encoding relies on a safe choice of the set Y, falling back to 
the incomplete lasso-shaped case when some syntactic restrictions on the expressions 
containing clocks are not met (see appendix for more details). 


5 Related Work 


There are many tools that allow for the specification and verification of infinite state 
symbolic synchronous transition systems. Given the focus of this paper, here we restrict 
our attention to tools supporting timed systems and/or MTL properties. 

Uppaal [15], the reference tool for timed systems verification, supports only 
bounded variable types and therefore finite asynchronous TTS. Properties are limited to 
a subset of the branching-time logic TCTL [16,17]. LTSmin [18] and Divine [19] are 
two model checkers that support the Uppaal specification language and properties spec- 
ified in LTL. RTD-Finder [20] handles only safety properties for real-time component- 
based systems specified in RT-BIP. The verification is based on a compositional compu- 
tation of an invariant over-approximating the set of reachable states of the system and 
leverages on counterexample-based invariant refinement algorithm. The ZOT Bounded 
Model/Satisfiability Checker [21] supports different logic languages through a multi- 
layered approach based on LTL with past operators. Similarly to NUXMvV, ZOT sup- 
ports dense-time MTL. It leverages only on SMT-based Bounded Model Checking, and 
is therefore unable to prove that properties hold. Atmoc [22] implements an extension of 
1C3 [7] and K-induction [23] to deal with symbolic timed transition systems. It supports 
both invariant and MTLo,.. properties, although for the latter it only supports bounded 
model checking. CTAV [24] reduces the model checking problem for an MTLo, s prop- 
erty ọ to a symbolic language emptiness check of a timed Biichi automata for ọ. 

Differently from all the above tools NUXMV is able to prove MTLo... properties 
on timed transition systems with infinite domain variables. 


6 Experimental Evaluation 


We compared NUXMV with Atmoc [22], CTAV [24], ZOT [21], Divine [19], LTSmin 
[18], and Uppaal [25]. 

For the evaluation we considered (i) scalable benchmarks taken from competitor 
tools distributions and from the literature; (ii) handcrafted benchmarks to stress various 
language features. In particular, we considered different versions of the Fisher mutual 
exclusion protocol (correct and buggy) with different properties, different versions of 
the emergency diesel generator problem (previously studied with Atmoc [22]). Finally 
we considered also the validity checks of some MTL properties also taken from [22]. 
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Fig. 3. Runtime for the Fisher mutual exclusion problem; x-axis: number of processes, y-axis: 
time (s). LTL-1 and MTL-1 properties are the bounded version of resp. LTL-0 and MTL-O. 


We run all the experiments on a PC equipped with a 3.7 GHz Xeon quad core CPU and 
16 Gb of RAM, using a time/memory limit of 1000 s/10 Gb for each test. We refer the 
reader to [26] to retrieve all the data to reproduce this experimental evaluation. 
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Fig. 5. Runtime (s) for the validity checks of MTL properties. 


7 Conclusions 


We presented the new version of NUXMV, a state-of-the art symbolic model checker 
for finite and infinite-state transition systems, that we extended to allow for the spec- 
ification of synchronous timed transition systems and of MTLo,.. properties. To sup- 
port the new features, we extended the NUXMvV language, we allowed for the speci- 
fication MTLo,.. formulas with parametric intervals, we adapted the model checking 
algorithms to find for lasso-shaped traces (over discrete semantics) where clocks may 
diverge. We evaluated the new features comparing NUXMV with other verification tools 
for timed automata, considering different benchmarks. The results show that NUXMV 
is competitive with and in many cases performs better than state-of-the-art tools, espe- 
cially on validity problems for MTLo,.0. 
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Abstract. C remains central to our infrastructure, making verification 
of C code an essential and much-researched topic, but the semantics of 
C is remarkably complex, and important aspects of it are still unsettled, 
leaving programmers and verification tool builders on shaky ground. This 
paper describes a tool, Cerberus-BMC, that for the first time provides a 
principled reference semantics that simultaneously supports (1) a choice 
of concurrency memory model (including substantial fragments of the 
C11, RC11, and Linux kernel memory models), (2) a modern memory 
object model, and (3) a well-validated thread-local semantics for a large 
fragment of the language. The tool should be useful for C programmers, 
compiler writers, verification tool builders, and members of the C/C++ 
standards committees. 


1 Introduction 


C remains central to our infrastructure, widely used for security-critical com- 
ponents of hypervisors, operating systems, language runtimes, and embedded 
systems. This has prompted much research on the verification of C code, but 
the semantics of C is remarkably complex, and important aspects of it are still 
unsettled, leaving programmers and verification tool builders on shaky ground. 
Here we are concerned with three aspects: 


1. The Concurrency Memory Model. The 2011 versions of the ISO C++ and 
C standards adopted a new concurrency model [3, 12,13], formalised during the 
development process [11], but the model is still in flux: various fixes have been 
found to be necessary [9,14,26]; the model still suffers from the “thin-air prob- 
lem” [10,15,35]; and Linux kernel C code uses a different model, itself recently 
partially formalised [7]. 
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I. Dillig and S. Tasiran (Eds.): CAV 2019, LNCS 11561, pp. 387-397, 2019. 
https: / /doi.org/10.1007/978-3-030-25540-4_22 


388 S. Lau et al. 


2. The Memory Object Model. A priori, one might imagine C follows one of two 
language-design extremes: a concrete byte-array model with pointers that are 
simply machine words, or an abstract model with pointers combining abstract 
block IDs and structured offsets. In fact C is neither of these: it permits casts 
between pointer and integer types, and manipulation of their byte representa- 
tions, to support low-level systems programming, but, while at runtime a C 
pointer will typically just be a machine word, compiler analyses and optimi- 
sations reason about abstract notions of the provenance of pointers [27, 29,31]. 
This is a subject of active discussion in the ISO C and C++ committees and in 
compiler development communities. 


3. The Thread-Local Sequential Semantics. Here, there are many aspects, e.g. the 
loosely specified evaluation order, the semantics of integer promotions, many 
kinds of undefined behaviour, and so on, that are (given an expert reading) 
reasonably well-defined in the standard, but that are nonetheless very complex 
and widely misunderstood. The standard, being just a prose document, is not 
executable as a test oracle; it is not a reference semantics usable for exploration 
or automated testing. 


Each of these is challenging in isolation, but there are also many subtle 
interactions between them. For example, between (1) and (3), the pre-C11 ISO 
standard text was in terms of sequential stepwise execution of an (informally 
specified) abstract machine, while the C11 concurrency model is expressed as 
a predicate over complete candidate executions, and the two have never been 
fully reconciled — e.g. in the standard’s treatment of object lifetimes. Then there 
are fundamental issues in combining the ISO treatment of undefined behaviour 
with that axiomatic-concurrency-model style [10, §7]. Between (1) and (2), one 
has to ask about the relationships between the definition of data race and the 
treatment of uninitialised memory and padding. Between (2) and (3), there are 
many choices for what the C memory object model should be, and how it should 
be integrated with the standard, which are currently under debate. Between all 
three one has to consider the relationships between uninitialised and thin-air 
values and the ISO notions of unspecified values and trap representations. These 
are all open questions in what the C semantics and ISO standard are (or should 
be). We do not solve them here, but we provide a necessary starting point: a 
tool embodying a precise reference semantics that lets one explore examples and 
debate the alternatives. 

We describe a tool, Cerberus-BMC, that for the first time lets one explore 
the allowed behaviours of C test programs that involve all three of the above. It 
is available via a web interface at http: //cerberus.cl.cam.ac.uk/bme.html. 

For (1), Cerberus-BMC is parameterised on an axiomatic memory concur- 
rency model: it reads in a definition of the model in a Herd-like format [6], and so 
can be instantiated with (substantial fragments of) either the C11 [3,9, 12-14], 
RC11 [26], or Linux kernel |7] memory models. The model can be edited in the 
web interface. Then the user can load (or edit in the web interface) a small C 
program. The tool first applies the Cerberus compositional translation (or elab- 
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oration) into a simple Core language, as in [29,31]; this elaboration addresses (3) 
by making many of the thread-local subtleties of C explicit, including the loose 
specification of evaluation order, arithmetic conversions, implementation-defined 
behaviour, and many kinds of undefined behaviour. Core computation is simply 
over mathematical integers, with explicit memory actions to interface with the 
concurrency and memory object models. However, there is a mismatch between 
the axiomatic style of the concurrency models for C (expressed as predicates 
on arbitrary candidate executions) with the operational style of the previous 
thread-local operational semantics for Core. We address this by replacing the 
latter with a new translation from Core into SMT problems. This is integrated 
with the concurrency model, also translated into SMT, following the ideas of [5]. 
These are furthermore integrated with an SMT version of parts of the PNVI 
(provenance-not-via-integers) memory object model of [29], the basis for ongo- 
ing work within the ISO WG14 C standards committee, addressing (2). The 
resulting SMT problems are passed to Z3 [32]. The web interface then provides 
a graphical view of the allowed concurrent executions for small test programs. 

The Cerberus-BMC tool should be useful for programmers, compiler writers, 
verification tool builders, and members of the C/C++ standards committees. 
We emphasise that it is intended as an executable reference semantics for small 
test programs, not itself as a verification tool that can be applied to larger bodies 
of C: we have focussed on making it transparently based on principled semantics 
for all three aspects, without the complexities needed for a high-performance 
verification tool. But it should aid the construction of such. 


Caveats and Limitations. Cerberus-BMC covers many features of 1-3, but far 
from all. With respect to the concurrency memory model, we support substan- 
tial fragments of the C11, RC11, and Linux kernel memory models. We omit 
locks and the (deprecated) C11/RC11 consume accesses. We only cover compare- 
exchange read-modify-write operations, and the fragment of RCU restricted to 
read_rcu_lock(), read_rcu_unlock(), and synchronize_rcu() used in a linear 
way, without control-flow-dependent calls to RCU, and without nesting. 

With respect to the memory object model, we do not currently support 
dynamic allocation or manipulation of byte representations (such as with char» 
pointers), and we do not address issues such as subobject provenance (an open 
question within WG14). 

With respect to the thread semantics, our translation to SMT does not cur- 
rently cover arbitrary pointer type-casting, function pointers, multi-dimensional 
arrays, unions, floating point, bitwise operations, and variadic functions, and 
only covers simple structs. In addition, we inherit the limitations of the Cer- 
berus thread semantics as per [29]. 


Related Work. There is substantial prior work on tools for concurrency semantics 
and for C semantics, but almost none that combines the two. On the concurrency 
semantics side, CppMem [1,11] is a web-interface tool that computes the allowed 
concurrent behaviours of small tests with respect to variants (now somewhat 
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outdated) of the C11 model, but it does not support other concurrency mod- 
els or a memory object model, and it supports only a small fragment of C. 
Herd [6,8] is a command-line tool that computes the allowed concurrent 
behaviours of small tests with respect to arbitrary axiomatic concurrency models 
expressed in its cat language, but without a memory object model and for tests 
which essentially just comprise memory events, without a C semantics. MemA1- 
loy [38] and MemSynth [16] also support reasoning about axiomatic concurrency 
models, but again not integrated with a C language semantics. 

On the C semantics side, several projects address sequential C semantics 
but without concurrency. We build here on Cerberus [28, 29,31], a web-interface 
tool that computes the allowed behaviours (interactively or exhaustively) for 
moderate-sized tests in a substantial fragment of sequential C, incorporating 
various memory object models (an early version supported Nienhuis’s opera- 
tional model for C11 concurrency [33], but that is no longer integrated). KCC 
and RV-Match [19, 21,22] provide a command-line semantics tool for a substan- 
tial fragment of C, again without concurrency. Krebbers gives a Coq semantics 
for a somewhat smaller fragment [24]. 

Then there is another large body of work on model-checking tools for sequen- 
tial and concurrent C. These are all optimised for model-checking performance, 
in contrast to the Cerberus-BMC emphasis on expressing the semantic envelope 
of allowed behaviour as clearly as we can (and, where possible, closely linked 
to the ISO standard). The former include tis-interpreter [18,36], CBMC [17,25], 
and ESBMC [20]. On the concurrent side, as already mentioned, we build on 
the approach of [5], which integrated various hardware memory concurrency 
models with CBMC. CDSChecker [34] supports something like the C/C++11 
concurrency model, but subject to various limitations [34, §1.3]. It is imple- 
mented using a dynamically-linked shared library for the C and C++ atomic 
types, so implicitly adopts the C semantic choices of whichever compiler is used. 
RCMC [23], supports memory models that do not exhibit Load Buffering (LB), 
for an idealised thread-local language. Nidhugg [4] supports only hardware mem- 
ory models: SC, TSO, PSO, and versions of POWER and ARM. 


2 Examples 
We now illustrate some of what Cerberus-BMC can do, by example. 


Concurrency Models. First, for C11 concurrency, Fig. 1 shows a screenshot for a 
classic message-passing test, with non-atomic writes and reads of x, synchronised 
with release/acquire writes and reads of y. The test uses an explicit parallel 
composition, written {-{...|||...}-}, to avoid the noise from the extra memory 
actions in pthread_create. The consistent race-free UB-free execution on the 
right shows the synchronisation working correctly: after the i read-acquire of y=1, 
the l non-atomic read of x has to read x=1 (there are no consistent executions 
in which it does not). As usual in C/C++ candidate execution graphs, rf are 
reads-from edges, sb is sequenced-before (program order), mo is modification 


Cerberus ¥ | example.c y | File ~ Model 7 
example.c o 


1 #include <stdatomic.h> 


3 int x = 0; 

4 _Atomic(int) y = 0; 

5 int ri, r2; 

o {-t 1 

7 x=1; 

8 atomic_store explicit ( 

9 &y, 1, memory order release) ; 
10 } IIIf 

11 rl = atomic_load_explicit( 
12 &y, memory_order_acquire); 
13 if (rl == 1) 

14 r2 = x; 

15 else 

16 r2 = 2; 

a7) } }-33 


Console x o 


1 # consistent executions: 2 

2 # executions with races: 0 

3 Return values: (specified Int 0), (spe 
4 
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Views ¥ | Model Checker Compile 


BMC x 


Zoomin Zoom Out 


Execution 2 of 2 


Prev 


Options 7 


i:Racq y=1 


Fig. 1. Cerberus-BMC Screenshot: C11 Release/Acquire Message Passing. If the read 
of y is 1, then the last thread has to see the write of 1 to x. 


#include "linux.h" 
int main() { 
int x = 0, y = 0; 
int rl, r2; 
{-{ { 
WRITE_ONCE(x, 1); 
// synchronize_rcu(); 
WRITE_ONCE(y, 1); 
+ Ill < 
rcu_read_lock(); 
rl = READ_ONCE(x); 
r2 = READ_ONCE(y); 
rcu_read_unlock(); 
} }-} 
assert(! (rl==0&&r2==1) ); 


e:Wna x=0 


i 


f:Wna y=0 


v 
o:Rna r1=0 
s 


p:Rna r2=1 


sb¥ 


q:Rna r1=0 


sbY 
r:Rna r2=1 


ask 


g:Wonce x=1 


t 


h:Wonce y=1 


i:Frculock 


sbt 


j:Ronce x=0 
y 
k:Wna r1=0 


sbÝ crit, 
l:Ronce y=1 


bi 


m:Wna r2=1 


sb¥ 


n:Frcuunlock 


Fig. 2. Linux kernel memory model RCU lock. Without synchronize rcu(), the reads 
of x and y can see 0 and 1 (as shown), even though they are enclosed in an RCU lock. 
With synchronization, after reading x=1, the last thread has to see y=1. 
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order (the coherence order between atomic writes to the same address), and 
asw is additional-synchronised-with, between parent and child threads and vice 
versa. Read and write events (R/W) are annotated na for non-atomic and rel/acq 
for release /acquire. 

For the Linux kernel memory model, the example in Fig.2 shows an RCU 
(read-copy-update) synchronisation. 


Memory Object Model. The example below illustrates a case where one cannot 
assume that C has a concrete memory object model: pointer provenance matters. 
In some C implementations, x and 
y will happen to be allocated adja- 
cent (the _ BMC ASSUME restricts 
attention to those executions). Then 
&x+1 will have the same numeric 


#include <stdint.h> 
int x = 1, y = 2; 
int main() { 
int *p = & +1; 
int *q = &y; 


address as &y, but the write *p=11 
is undefined behaviour rather than 
a write to y. This was informally 


__BMC_ASSUME( (intptr_t)p==(intptr_t)q); 
if ((intptr_t) p==(intptr_t)q) 
*p = 11; // does this have UB? 


described in the 2004 ISO WG14 } 

C standards committee response to 

Defect Report 260 [37], but has never been incorporated into the stan- 
dard itself. Cerberus-BMC correctly reports UB found: source.c:8:5-7, 
UB043_indirection_invalid_value following the PNVI (provenance-not-via- 
integers) memory object model of [29]. 


ISO Subtleties. Turning to areas where the ISO standard is clear to experts but 
widely misunderstood, in the example on the right ISO leaves it implementation- 
defined whether char is signed or unsigned. In the for- 
mer case, the ISO integer promotion and conversion 
semantics will make the equality test false, leading to 
a division by 0, which is undefined behaviour. 
The example below shows the correct treatment +} 

of the ISO standard’s loose specification of evaluation order, together with detec- 
tion of the concurrency model’s unsequenced races (ur in the diagram): there are 
write and read accesses to x that are unrelated by sequenced-before (sb), and 
not otherwise synchronised and hence unrelated by happens-before, which makes 
this program undefined behaviour. 


int main() { 

char cl = Oxff; 
unsigned char c2 = Oxff; 
return 1 / (cl == c2); 


c:Wna x=0 
| 
rf f mo 

int main() { at 

int x=0; a N 

int w; f:Rna x=0 -i e:Wna x=1 

W = X++ + X; 
} sl 


g:Wna w=0 
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Treiber Stack. Finally, demonstrating the combination of all three aspects, we 
implemented a modified Treiber stack (the push() function is shown in Fig. 3) 
with relaxed accesses to struct fields. Although the Treiber stack is traditionally 
implemented by spinning on a compare-and-swap, as that can spin unbound- 
edly, we instead use __BMC_ASSUME to restrict executions to those where the 
compare-and-swap succeed. Our tool correctly detects the different results from 
the concurrent relaxed-memory execution of threads concurrently executing the 
push and pop functions. 


struct Node { int data; struct Node «next; }; 
struct Node « _Atomic T; 
void push(struct Node *x, int v) { 
struct Node *t; 
x->data = v; 
t = atomic_load_explicit(&T, memory_order_relaxed) ; 
x->next = t; 
__BMC_ASSUME (atomic_compare_exchange_strong_explicit(&T, &t, x, 
memory_order_acq_rel, memory_order_relaxed) ); 


Fig. 3. Treiber stack push() 


1 proc main (): eff loaded integer := 

2 let strong x: pointer = create(Ivalignof('’signed int’), ‘signed int’) in 
3 let strong a_437: loaded integer = pure(Specified(1)) in 

4 store(’signed int’, x, conv_loaded_int(’signed int’, a_437)) ; 

5 kill(x) ; 

6 (save ret_435: loaded integer (a_436: loaded integer:= Specified(Q)) in 
7 pure(a_436) ) 


Fig. 4. Core program corresponding to int main(){int x = 1}. Core is essentially a 
typed, first-order lambda calculus with explicit memory actions such as create and 
store to interface with the concurrency and memory object models. 


3 Implementation 


After translating a C program into Core (see Fig.4), Cerberus-BMC does a 
sequence of Core-to-Core rewrites in the style of bounded model checkers such 
as CBMC: it unwinds loops and inlines function calls (to a given bound), and 
renames symbols to generate an SSA-style program. 

The explicit representation of memory operations in Core as first-order con- 
structs allows the SMT translation to be easily separated into three components: 
the translation from Core to SMT, the memory object model constraints, and 
the concurrency model constraints. 
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1. Core to SMT. Each value in Core is represented as an SMT expression, with 
fresh SMT constants for memory actions such as create and store (e.g. lines 
2 and 4), the concrete values of which are constrained by the memory object 
and concurrency models. The elaboration of C to Core makes thread-local unde- 
fined behaviour (as opposed to undefined behaviour from concurrency or memory 
layout), like signed integer overflow, explicit with a primitive undef construct. 
Undefined behaviour is then encoded in SMT as reachability of undef expres- 
sions, that is, satisfiability of the control-flow guards up to them. 


2. Memory Object Model. As in the PNVI semantics [30], Cerberus-BMC rep- 
resents pointers as pairs (m,a) of a provenance 7 and an integer address a. The 
provenance of a pointer is taken into account when doing memory accesses, 
pointer comparisons, and casts between integer and pointer values. Our tool 
models address allocation nondeterminism by constraining address values based 
on allocations to be appropriately aligned and non-overlapping, but not con- 
straining the addresses otherwise. 


8. Concurrency Model. Cerberus-BMC statically extracts memory actions and 
computes an extended pre-execution containing relations such as program order. 
As control flow can not be statically determined, memory actions are associated 
with an SMT boolean guard representing the control flow conditions upon which 
the memory action is executed. 

Cerberus-BMC reads in a model definition in a subset of the herd cat lan- 
guage large enough to express C11, RC11, and Linux, and generates a set of 
quantifier-free SMT expressions corresponding to the model’s constraints on 
relations. These constraints are based on a set of “built-in” relations defined 
in SMT such as rf. Cerberus-BMC then queries Z3 to extract all the executions, 
displaying the load/store values and computed relations for the user. 


4 Validation 


We validate correctness of the three aspects of Cerberus-BMC as follows, though, 
as ever, additional testing would be desirable. Performance data, demonstrating 
practical usability, is from a MacBook Pro 2.9GHz Intel Core i5. 

For C11 and RC11 concurrency, we check on 12 classic litmus tests. For Linux 
kernel concurrency, we hand-translated the 9 non-RCU tests and 4 of the RCU 
tests of [7] into C, and automatically translated the 40 tests of [2]. Running all 
the non-RCU tests takes less than 5 min; the RCU tests are slower, of the order 
of one hour, perhaps because of the recursive definitions involved. 

For the memory object model, we take the supported subset (36 tests) of the 
provenance semantics test suite of [29]. These single-threaded tests each run in 
less than a second. 

For the thread-local semantics, the Cerberus pipeline to Core has previously 
been validated using GCC Torture, Toyota ITC, KCC, and Csmith-generated 
test suites [29]. We check the mapping to BMC using 50 hand-written tests and 
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the supported subset (400 tests) of the Toyota ITC test suite, each running in 
less than two minutes. 


These test suites and the examples in the paper can be accessed via the CAV 


2019 pop-up in the File menu of the tool. 
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Abstract. Hybrid system falsification is an actively studied topic, as a scalable 
quality assurance methodology for real-world cyber-physical systems. In falsifi- 
cation, one employs stochastic hill-climbing optimization to quickly find a coun- 
terexample input to a black-box system model. Quantitative robust semantics is 
the technical key that enables use of such optimization. In this paper, we tackle the 
so-called scale problem regarding Boolean connectives that is widely recognized 
in the community: quantities of different scales (such as speed [km/h] vs. rpm, or 
worse, rph) can mask each other’s contribution to robustness. Our solution con- 
sists of integration of the multi-armed bandit algorithms in hill climbing-guided 
falsification frameworks, with a technical novelty of a new reward notion that we 
call hill-climbing gain. Our experiments show our approach’s robustness under 
the change of scales, and that it outperforms a state-of-the-art falsification tool. 


1 Introduction 


Hybrid System Falsification. Quality assurance of cyber-physical systems (CPS) is 
attracting growing attention from both academia and industry, not only because it is 
challenging and scientifically interesting, but also due to the safety-critical nature of 
many CPS. The combination of physical systems (with continuous dynamics) and dig- 
ital controllers (that are inherently discrete) is referred to as hybrid systems, capturing 
an important aspect of CPS. To verify hybrid systems is intrinsically hard, because the 
continuous dynamics therein leads to infinite search spaces. 

More researchers and practitioners are therefore turning to optimization-based falsi- 
fication as a quality assurance measure for CPS. The problem is formalized as follows. 
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The falsification problem 


— Given: a model M (that takes an input signal u and 


u Mu 
yields an output signal M(u)), and a specification ọ (a M S 


temporal formula) 
— Find: a falsifying input, that is, an input signal u such 
that the corresponding output M (u) violates y 


In optimization-based falsification, the above problem is turned into an optimiza- 
tion problem. It is robust semantics of temporal formulas [12, 17] that makes it possible. 
Instead of the Boolean satisfaction relation v = y, robust semantics assigns a quantity 
Iv, y] € RU {co, —oo} that tells us, not only whether ¢ is true or not (by the sign), but 
also how robustly the formula is true or false. This allows one to employ hill-climbing 
optimization: we iteratively generate input signals, in the direction of decreasing robust- 
ness, hoping that eventually we hit negative robustness. 


Table 1. Boolean satisfaction w = y, and quantitative robustness values |w, p], of three signals 
of speed for the STL formula y = Ojo,30] (speed < 120) 


120 120) 


SPEED 
SPEED 
SPEED 


signal w 


wee 
[w, e] 


An illustration of robust semantics is in Table 1. We use signal temporal logic (STL) 
[12], a temporal logic that is commonly used in hybrid system specification. The spec- 
ification says the speed must always be below 120 during the time interval [0,30]. In 
the search of an input signal u (e.g. of throttle and brake) whose corresponding out- 
put M (u) violates the specification, the quantitative robustness [M (u), y] gives much 
more information than the Boolean satisfaction M(u) — g. Indeed, in Table 1, while 
Boolean satisfaction fails to discriminate the first two signals, the quantitative robust- 
ness indicates a tendency that the second signal is closer to violation of the specification. 

In the falsification literature, stochastic algorithms are used for hill-climbing opti- 
mization. Examples include simulated annealing (SA), globalized Nelder-Mead (GNM 
[30]) and covariance matrix adaptation evolution strategy (CMA-ES [6]). Note that 
the system model M can be black-box: we have only to observe the correspondence 
between input u and output M (u). Observing an error M(u’) for some input u’ is suf- 
ficient evidence for a system designer to know that the system needs improvement. 
Besides these practical advantages, optimization-based falsification is an interesting 
scientific topic: it combines two different worlds of formal reasoning and stochastic 
optimization. 
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Optimization-based falsification started in [17] and has been developed vigorously 
[1,3-5,9, 11-13, 15, 27,28, 34, 36,38]. See [26] for a survey. There are mature tools such 
as Breach [11] and S-Taliro [5]; they work with industry-standard Simulink models. 


Challenge: The Scale Problem in Boolean Superposition. In the field of hybrid 
falsification—and more generally in search-based testing—the following problem is 
widely recognized. We shall call the problem the scale problem (in Boolean superposi- 
tion). 

Consider an STL specification y = Olo,30)(—(rpm > 4000) V (speed > 20)) 
for a car; it is equivalent to Ojo 30 ((rpm > 4000) — (speed > 20)) and says that 
the speed should not be too small whenever the rpm is over 4000. According to the 
usual definition in the literature [11,17], the Boolean connectives — and V are inter- 
preted by — and the supremum L, respectively; and the “always” operator Ljg,39] is by 
infimum |_|. Therefore the robust semantics of y under the signal (rpm, speed), where 
rpm, speed: [0,30] — R, is given as follows. 


[(rpm, speed), Y] = F ]ke10,30) ( (4000 — rpm(t)) U (speed(t) — 20) ) (1) 


A problem is that, in the supremum of two real values in (1), one component can totally 
mask the contribution of the other. In this specific example, the former (rpm) compo- 
nent can have values as big as thousands, while the latter (speed) component will be 
in the order of tens. This means that in hill-climbing optimization it is hard to use the 
information of both signals, as one will be masked. 

Another related problem is that the efficiency of a falsification algorithm would 
depend on the choice of units of measure. Imagine replacing rpm with rph in (1), which 
makes the constant 4000 into 240000, and make the situation even worse. 

These problems—that we call the scale problem—occur in many falsification exam- 
ples, specifically when a specification involves Boolean connectives. We do need 
Boolean connectives in specifications: for example, many real-world specifications in 
industry are of the form O;(y1 — 42), requiring that an event y1 triggers a counter- 
measure yv all the time. 

One could use different operators for interpreting Boolean connectives. For exam- 
ple, in [21], V and A are interpreted by + and x over R, respectively. However, these 
choices do not resolve the scale problem, either. In general, it does not seem easy to 
come up with a fixed set of operators over R that interpret Boolean connectives and are 
free from the scale problem. 


Contribution: Integrating Multi-Armed Bandits into 

Optimization-Based Falsification. As a solution to the 

scale problem in Boolean superposition that we just 

described, we introduce a new approach that does not pı p2 
superpose robustness values. Instead, we integrate multi- 

armed bandits (MAB) in the existing framework of fal- Fig. 1. A multi-armed bandit for 
sification guided by hill-climbing optimization. falsifying Or(y1 A p2) 
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The MAB problem is a prototypical reinforcement learning problem: a gambler sits 
in front of a row of slot machines; their performance (i.e. average reward) is not known; 
the gambler plays a machine in each round and he continues with many rounds; and the 
goal is to optimize cumulative rewards. The gambler needs to play different machines 
and figure out their performance, at the cost of the loss of opportunities in the form of 
playing suboptimal machines. 

In this paper, we focus on specifications of the form O7(y~1 A %2) and 
1(~1 V p2); we call them (conjunctive/disjunctive) safety properties. We identify 
an instance of the MAB problem in the choice of the formula (out of v1, ~2) to 
try to falsify by hill climbing. See Fig. 1. We combine MAB algorithms (such as 
e-greedy and UCB1, see Sect. 3.2) with hill-climbing optimization, for the purpose of 
coping with the scale problem in Boolean superposition. This combination is made pos- 
sible by introducing a novel reward notion for MAB, called hill-climbing gain, that is 
tailored for this purpose. 

We have implemented our MAB-based falsification framework in MATLAB, build- 
ing on Breach [11].! Our experiments with benchmarks from [7,24,25] demonstrate 
that our MAB-based approach is a viable one against the scale problem. In particular, 
our approach is observed to be (almost totally) robust under the change of scaling (i.e. 
changing units of measure, such as from rpm to rph that we discussed after the for- 
mula (1)). Moreover, for the benchmarks taken from the previous works—they do not 
suffer much from the scale problem—our algorithm performs better than the state-of- 
the-art falsification tool Breach [11]. 


Related Work. Besides those we mentioned, we shall discuss some related works. 

Formal verification approaches to correctness of hybrid systems employ a wide 
range of techniques, including model checking, theorem proving, rigorous numerics, 
nonstandard analysis, and so on [8, 14, 18,20,22,23,29,32]. These are currently not 
very successful in dealing with complex real-world systems, due to issues like scala- 
bility and black-box components. 

Our use of MAB in falsification exemplifies the role of the exploration-exploitation 
trade-off, the core problem in reinforcement learning. The trade-off has been already 
discussed for the verification of quantitative properties (e.g., [33]) and also in some 
works on falsification. A recent example is [36], where they use Monte Carlo tree search 
to force systematic exploration of the space of input signals. Besides MCTS, Gaussian 
process learning (GP learning) has also attracted attention in machine learning as a 
clean way of balancing exploitation and exploration. The GP-UCB algorithm is a widely 
used strategy there. Its use in hybrid system falsification is pursued e.g. in [3,34]. 

More generally, coverage-guided falsification [1,9,13,28] aims at coping with the 
exploration-exploitation trade-off. One can set the current work in this context—the 
difference is that we force systematic exploration on the specification side, not in the 
input space. 

There have been efforts to enhance expressiveness of MTL and STL, so that engi- 
neers can express richer intentions—such as time robustness and frequency—in speci- 


' Code obtained at https://github.com/decyphir/breach. 
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fications [2,31]. This research direction is orthogonal to ours; we plan to investigate the 
use of such logics in our current framework. 

A similar masking problem around Boolean connectives is discussed in 
[10, 19]. Compared to those approaches, our technique does not need the explicit dec- 
laration of input vacuity and output robustness, but it relies on the “hill-climbing gain” 
reward to learn the significance of each signal. 

Finally, the interest in the use of deep neural networks is rising in the field of falsi- 
fication (as well as in many other fields). See e.g. [4,27]. 


2 Preliminaries: Hill Climbing-Guided Falsification 


We review a well-adopted methodology for hybrid system falsification, namely the 
one guided by hill-climbing optimization. It makes essential use of quantitative robust 
semantics of temporal formulas, which we review too. 


2.1 Robust Semantics for STL 
Our definitions here are taken from [12,17]. 


Definition 1 ((time-bounded) signal). Let T € Ry, be a positive real. An M- 
dimensional signal with a time horizon T is a function w: [0, T] — R™. 

Let w: [0,7] — R™ and w’: [0,T'] — R™ be M-dimensional signals. Their 
concatenation w - w': [0,T + T'] — R™ is the M-dimensional signal defined by 
(w-w’)(t) = w(t) ift € [0, T], and (w- w^) (t) = w(t- T) ifte (T,T +T]. 

Le 0 < Ti < Ty < T. The restriction W|ir, m): 0, T2 — Ti] > RY of 
w: [0, T] > R™ to the interval [T;, T2] is defined by (w|ir, 7,})(t) = w(Tı + t). 


One main advantage of optimization-based falsification is that a system model can be a 
black box—observing the correspondence between input and output suffices. We there- 
fore define a system model simply as a function. 


Definition 2 (system model M). A system model, with M-dimensional input and N- 
dim. output, is a function M that takes an input signal u: [0, T] — R™ and returns 
a signal M(u): [0, T] — R”. Here the common time horizon T € R, is arbitrary. 
Furthermore, we impose the following causality condition on M: for any time-bounded 
signals u: [0, T] — R™ and w’: [0,T'] — R™, we require that M(u- u’ 
M(u). 


Definition 3 (STL syntax). We fix a set Var of variables. In STL, atomic 


ior T 


propositions and formulas are defined as follows, respectively: a ::= f (x1,..., £N) > 
0, and y :=a | L| -y| p^ | Vo | yur ¢. Here f is an N-ary function 
f: RN >R, z1,...,£y € Var, and J is a closed non-singular interval in R>o, Le. 


I = [a,b] or [a, ©) where a,b € R anda < b. 


We omit subscripts J for temporal operators if I = [0, o0). Other common connec- 
tives such as >, T, Oz (always) and >; (eventually), are introduced as abbreviations: 
Org = T Ur p and Ory = -O7-¥y. An atomic formula f(a) < c, where c € R, is 
accommodated using — and the function f’(a) :=f (æ) — c. 
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Definition 4 (robust semantics [12]). Let w: [0,7] — R^ be an N-dimensional 
signal, and t € [0, T). The t-shift of w, denoted by w’, is the time-bounded signal 
wt: [0,7 — t] — R” defined by w‘(t’) :=w(t+ t’). 

Let w: [0,7] — R'¥"! be a signal, and y be an STL formula. We define the 
robustness |w, p] € RU {o0, —oo} as follows, by induction on the construction of 
formulas. Here [ ] and |_| denote infimums and supremums of real numbers, respectively. 
Their binary version M and U denote minimum and maximum. 


[w, f(@1,-++ En) > 0J = f(w(0)(21),-+- ,w(0)(an)) 


[w, L] := — 0O [w, ~o] = [w, | 
[w, 91 A p2] := Iw, gi] N [w, v2] [w, y1 V p2] := Iw, gi] UL, p2] 
[w, y1 Ur p2] = Le rmyo.r)( [w p2] n Mrepn liwy] ) (2) 


For atomic formulas, |w, f(a) > c] stands for the vertical margin f(a) — c for the 
signal w at time 0. A negative robustness value indicates how far the formula is from 
being true. It follows from the definition that the robustness for the eventually modality 
is given by [w, Ôj,» (£ > 0)] = Ure fa,s)nto,7) (4) (2). 

The above robustness notion taken from [12] is therefore spatial. Other robustness 
notions take temporal aspects into account, too, such as “how long before the deadline 
the required event occurs”. See e.g. [2, 12]. Our choice of spatial robustness in this paper 
is for the sake of simplicity, and is thus not essential. 

The original semantics of STL is Boolean, given as usual by a binary relation = 
between signals and formulas. The robust semantics refines the Boolean one in the fol- 
lowing sense: |w, y] > 0 implies w = y, and |w, y] < 0 implies w  y, see [17, 
Prop. 16]. Optimization-based falsification via robust semantics hinges on this refine- 
ment. 


2.2 Hill Climbing-Guided Falsification 


As we discussed in the introduction, the falsification problem attracts growing industrial 
and academic attention. Its solution methodology by hill-climbing optimization is an 
established field, too: see [1,3,5,9, 11-13, 15,26,28,34,38] and the tools Breach [11] 
and S-TaLiRo [5]. We formulate the problem and the methodology, for later use in 
describing our multi-armed bandit-based algorithm. 


Definition 5 (falsifying input). Let M be a system model, and y be an STL formula. 
A signal u: [0,7] > R!¥"! is a falsifying input if [M(u), p] < 0; the latter implies 
M(u) E y. 


The use of quantitative robust semantics |M (u), y] € RU {00, —oo} in the above 
problem enables the use of hill-climbing optimization. 


Definition 6 (hill climbing-guided falsification). Assume the setting in Definition 5. 
For finding a falsifying input, the methodology of hill climbing-guided falsification is 
presented in Algorithm 1. 
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Here the function HILL-CLIMB makes a guess of an input signal ux, aiming at 
minimizing the robustness [M (ux), y]. It does so, learning from the previous observa- 
tions (w, [M(u)), 9] Jiet- of input signals u4, . . . , Uz, and their corresponding 
robustness values (cf. Table 1). 


The HILL-CLIMB function can be implemented by various stochastic optimization 
algorithms. Examples are CMA-ES [6] (used in our experiments), SA, and GNM [30]. 


3 Our Multi-armed Bandit-Based Falsification Algorithm 


In this section, we present our contribution, namely a falsification algorithm that 
addresses the scale problem in Boolean superposition (see Sect. 1). The main novel- 
ties in the algorithm are as follows. 


1. (Use of MAB algorithms) For binary Boolean connectives, unlike most works in 
the field, we do not superpose the robustness values of the constituent formulas y1 
and (2 using a fixed operator (such as M and U in (2)). Instead, we view the situation 
as an instance of the multi-armed bandit problem (MAB): we use an algorithm for 
MAB to choose one formula y; to focus on (here i € {1, 2}); and then we apply hill 
climbing-guided falsification to the chosen formula y;. 

2. (Hill-climbing gain as rewards in MAB) For our integration of MAB and hill- 
climbing optimization, the technical challenge is find a suitable notion of reward for 
MAB. We introduce a novel notion that we call hill-climbing gain: it formulates the 
(downward) robustness gain that we would obtain by applying hill-climbing opti- 
mization, suitably normalized using the scale of previous robustness values. 


Later, in Sect. 4, we demonstrate that combining those two features gives rise to falsifi- 
cation algorithms that successfully cope with the scale problem in Boolean superposi- 
tion. 

Our algorithms focus on a fragment of STL as target specifications. They are called 
(disjunctive and conjunctive) safety properties. In Sect.3.1 we describe this fragment 
of STL, and introduce necessary adaptation of the semantics. After reviewing the MAB 
problem in Sect. 3.2, we present our algorithms in Sects. 3.3, 3.4. 


Algorithm 1. Hill climbing-guided falsification 
Require: a system model M, an STL formula y, and a budget K 
1: function HILL-CLIMB-FALSIFY(M, y, K) 
2 rb- œ; k0 > rb is the smallest robustness so far, initialized to co 
3 while rb > 0 and k < K do 
4: k—k+1 
5 
6 
7 


uk — HILL-CLIMB ( (u, [M (u), g] Vensan 
rb, — [M(ux), 9] 
; if rb, < rb then rb — rbk 
5: ane fe if rb < 0, that is, rb, = [M (uz), p] < 0 
Failure otherwise, that is, no falsifying input found within budget K 
9: return u 


408 Z. Zhang et al. 


3.1 Conjunctive and Disjunctive Safety Properties 


Definition 7 (conjunctive/disjunctive safety property). An STL formula of the form 
1(~1 A ~2) is called a conjunctive safety property; an STL formula of the form 
1(Y1 V p2) is called a disjunctive safety property. 


It is known that, in industry practice, a majority of specifications is of the form 
1(~1 — p2), where yı describes a trigger and p2 describes a countermeasure that 
should follow. This property is equivalent to Oz (~y: V p2), and is therefore a disjunc- 
tive safety property. 

In Sects. 3.3, 3.4, we present two falsification algorithms, for conjunctive and dis- 
junctive safety properties respectively. For the reason we just discussed, we expect the 
disjunctive algorithm should be more important in real-world application scenarios. In 
fact, the disjunctive algorithm turns out to be more complicated, and it is best introduced 
as an extension of the conjunctive algorithm. 

We define the restriction of robust semantics to a (sub)set of time instants. Note that 
we do not require S C [0, T] to be a single interval. 


Definition 8 ([w, Y] s, robustness restricted to S C [0, T]). Let w: [0,7] — R'V@"! 
be a signal, y be an STL formula, and S C [0, T] be a subset. We define the robustness 
of w under ¢ restricted to S by 


[w, y]s = Plies [w’, y]. (3) 


Obviously, |w, ~].5 < 0 implies that there exists t € S such that [w*, Y]s < 0. We 
derive the following easy lemma; it is used later in our algorithm. 


Lemma 9. In the setting of Definition 8, consider a disjunctive safety property p = 
1(%1 V p2), and let S :={t € IN [0, T] | Jwt, vi] < 0}. Then |w, p2]s < 0 implies 
w,O7(¢1 V 92)] < 0. 


3.2 The Multi-Armed Bandit (MAB) Problem 
The multi-armed bandit (MAB) problem describes a situation where, 


— a gambler sits in front of a row A1, ..., An of slot machines; 

— each slot machine A; gives, when its arm is played (i.e. in each attempt), a reward 
according to a prescribed (but unknown) probability distribution 4;; 

— and the goal is to maximize the cumulative reward after a number of attempts, play- 
ing a suitable arm in each attempt. 


The best strategy of course is to keep playing the best arm Amax, i.e. the one whose 
average reward avg({imax) is the greatest. This best strategy is infeasible, however, 
since the distributions 41,..., Hn are initially unknown. Therefore the gambler must 
learn about u1, ... , Hn through attempts. 

The MAB problem exemplifies the “learning by trying” paradigm of reinforcement 
learning, and is thus heavily studied. The greatest challenge is to balance between 
exploration and exploitation. A greedy (i.e. exploitation-only) strategy will play the 
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arm whose empirical average reward is the maximum. However, since the rewards are 
random, this way the gambler can miss another arm whose real performance is even 
better but which is yet to be found so. Therefore one needs to mix exploration, too, 
occasionally trying empirically non-optimal arms, in order to identity their true perfor- 
mance. 

The relevance of MAB to our current problem is as follows. Falsifying a conjunctive 
safety property O;(y1 A p2) amounts to finding a time instant t € I at which either 1 
or y2 is falsified. We can see the two subformulas (yı and p2) as two arms, and this 
constitutes an instance of the MAB problem. In particular, playing an arm translates to 
a falsification attempt by hill climbing, and collecting rewards translates to spending 
time to minimize the robustness. We show in Sects. 3.3—3.4 that this basic idea extends 
to disjunctive safety properties O;(y1 V p2), too. 


Algorithm 2. The e-greedy algorithm for multi-armed bandits 


Require: the setting of Def. 10, and a constant £ > 0 (typically very small) 
At the k-th attempt, choose the arm A;, as follows 
1: Jemp-opt — arg max R(j, k — 1) > the arm that is empirically optimal 
jE[1,n] 
2: Sample ix € [1, n] from the distribution 
Jemp-opt = (1 = £) + a 
jÈ foreach j € [1, n] \ {jempop} 


3: return 2; 


A rigorous formulation of the MAB problem is presented for the record. 


Definition 10 (the multi-armed bandit problem). The multi-armed bandit (MAB) 
problem is formulated as follows. 

Input: arms (A;,..., An), the associated probability distributions 41, ..., Hn over R, 
and a time horizon H € NU {oo}. 

Goal: synthesize a sequence A;, Ai, ... A;,,, so that the cumulative reward yy reWk 
is maximized. Here the reward rew, of the k-th attempt is sampled from the distribution 
Hi, associated with the arm A;, played at the k-th attempt. 

We introduce some notations for later use. Let (A;, ... Aj,, rew1 ... rew;) be a his- 
tory, i.e. the sequence of arms played so far (here i1,...,%% € [L, n]), and the sequence 
of rewards obtained by those attempts (rew; is sampled from {u;, ). 

For an arm A,, its visit count N(j, Aj, Ai, ...Ai,, rewirew2 . . . rewg) is given 
by the number of occurrences of A; in A;, Aj, ...Aj,. Its empirical average reward 
R(j, Aj, Ain --- Aip, Fewirew2...rew;,) is given by Doletiel. lis} rew,, i.e. the 
average return of the arm A, in the history. When the history is obvious from the con- 
text, we simply write N (j, k) and R(j, k). 


MAB Algorithms. There have been a number of algorithms proposed for the MAB 
problem; each of them gives a strategy (also called a policy) that tells which arm to 
play, based on the previous attempts and their rewards. The focus here is how to resolve 
the exploration-exploitation trade-off. Here we review two well-known algorithms. 
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The €-Greedy Algorithm. This is a simple algorithm that spares a small fraction € of 
chances for empirically non-optimal arms. The spared probability £ is uniformly dis- 
tributed. See Algorithm 2. 


The UCBI Algorithm. The UCB1 (upper confidence bound) algorithm is more com- 
plex; it comes with a theoretical upper bound for regrets, i.e. the gap between the 
expected cumulative reward and the optimal (but infeasible) cumulative reward (i.e. 
the result of keep playing the optimal arm Amax). It is known that the UCB1 algo- 
rithm’s regret is at most O(./nH log H) after H attempts, improving the naive random 
strategy (which has the expected regret O( H)). 

See Algorithm 3. The algorithm is deterministic, and picks the arm that maximizes 
the value shown in Line 1. The first term R(j, k — 1) is the exploitation factor, reflecting 
the arm’s empirical performance. The second term is the exploration factor. Note that 
it is bigger if the arm A; has been played less frequently. Note also that the exploration 
factor eventually decays over time: the denominator grows roughly with O(k), while 
the numerator grows with O(In k). 


Algorithm 3. The UCB1 algorithm for multi-armed bandits 


Require: the setting of Def. 10, and a constant c > 0 
At the k-th attempt, choose the arm A;, as follows 


l: ik — arg max (RU, k—-1)+cy/ aa) 
j€[1,n] : 


2: return ik 


Algorithm 4. Our MAB-guided algorithm I: conjunctive safety properties 


Require: a system model M, an STL formula y = Oz (p1 A p2), and a budget K 
1: function MAB-FALSIFY-CONJ-SAFETY(M, y, K) 
2: rh+oco; k0 
> rb is the smallest robustness seen so far, for either O71 or Ur ye 


3: while rb > 0 and k < K do > iterate if not yet falsified, and within budget 
k—k+l1 
5: tk — MAB( (1,2); (R(y1), R(p2)), Pir +++ Pigi; rew... rews-1 ) 
> an MAB choice of i, € {1,2} for optimizing the reward R(y;, ) 
6: up <— HILL-CLIMB ( ( (uz, rbz) Veiran a 


> suggestion of the next input ux by hill climbing, based on the previous observa- 
tions on the formula y;, (those on the other formula are ignored) 
J: rb, — [M (uz), Orpi, ] 
8: if rb, < rb then rb — rbk 


uk if rb < 0 
u — 

Failure otherwise, that is, no falsifying input found within budget K 
10: return u 
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Algorithm 5. Our MAB-guided algorithm II: disjunctive safety properties 


Require: a system model M, an STL formula y = O7(¢1 V p2), and a budget K 
1: function MAB-FALSIFY-DISJ-SAFETY(M, y, K) 
The same as Algorithm 4, except that Line 7 is replaced by the following Line 7’. 
T: rb — [M(ux), %i,]s, where Sk = { t € IN [0,7] | [M (uj), yz] < 0} 
> here pzy denotes the other formula than y;,,, among 91, p2 


3.3 Our MAB-Guided Algorithm I: Conjunctive Safety Properties 


Our first algorithm targets at conjunctive safety properties. It is based on our identifi- 
cation of MAB in a Boolean conjunction in falsification—this is as we discussed just 
above Definition 10. The technical novelty lies in the way we combine MAB algorithms 
and hill-climbing optimization; specifically, we introduce the notion of hill-climbing 
gain as a reward notion in MAB (Definition 11). This first algorithm paves the way to 
the one for disjunctive safety properties, too (Sect. 3.4). 

The algorithm is in Algorithm 4. Some remarks are in order. 

Algorithm 4 aims to falsify a conjunctive safety property y = Or(y1 A 2). Its 
overall structure is to interleave two sequences of falsification attempts, both of which 
are hill climbing-guided. These two sequences of attempts aim to falsify LJ;y1 and 
12, respectively. Note that [M(u), y] < [M(u), Ory1], therefore falsification of 
191 implies falsification of y; the same holds for L;Ye, too. 

In Line 5 we run an MAB algorithm to decide which of Ory, and Liye 
to target at in the k-th attempt. The function MAB takes the following as its 
arguments: (1) the list of arms, given by the formulas yj, y2; (2) their rewards 
R(y1),R(y2); (3) the history y;,...~i,_, of previously played arms (i; € 
{1,2}); and 4) the history rew,...rew,—1 of previously observed rewards. This 
way, the type of the MAB function in Line 5 matches the format in Defini- 
tion 10, and thus the function can be instantiated with any MAB algorithm such as 
Algorithms 2-3. 

The only missing piece is the definition of the rewards R(y1), R(y2). We introduce 
the following notion, tailored for combining MAB and hill climbing. 


Definition 11 (hill-climbing gain). In Algorithm 4, in Line 5, the reward R(y;) of the 
arm ọ; (where i € {1, 2}) is defined by 


max-rb(i, k — 1) — last-rb(z, k — 1) 


R(yi) = max-rb(i, k — 1) 
0 otherwise 


if p; has been played before 


Here max-rb(i, k — 1) :=max{rb; | l € [1, k — 1], i = i} Ge. the greatest rb; so far, 
in those attempts where y; was played), and last-rb(i, k — 1) :=rbj,,., with last being 
the greatest 1 € [1, k — 1] such that i; = i (i.e. the last rb; for y;). 


Since we try to minimize the robustness values rb; through falsification attempts, we 
can expect that rb; for a fixed arm y; decreases over time. (In the case of the hill- 
climbing algorithm CMA-ES that we use, this is in fact guaranteed). Therefore the value 
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max-rb(i, k — 1) in the definition of R(y;) is the first observed robustness value. The 
numerator max-rb(i, k — 1) — last-rb(i, k — 1) then represents how much robustness 
we have reduced so far by hill climbing—hence the name “hill-climbing gain.” The 
denominator max-rb(i, k — 1) is there for normalization. 

In Algorithm 4, the value rb;, is given by the robustness [.M (uz), Ory, J]. There- 
fore the MAB choice in Line 5 essentially picks t% for which hill climbing yields greater 
effect (but also taking exploration into account—see Sect. 3.2). 

In Line 6 we conduct hill-climbing optimization—see Sect.2.2. The function 
HILL-CLIMB learns from the previous attempts uj,,..., u, regarding the same for- 
mula y;,, and their resulting robustness values rb;,,..., rby,,. Then it suggests the next 
input signal ux that is likely to minimize the (unknown) function that underlies the 
correspondences [ wi, + rb; ] 


jE[1,m]° 

Lines 6-8 read as follows: the hill-climbing algorithm suggests a single input ux, 
which is then selected or rejected (Line 8) based on the robustness value it yields 
(Line 7). We note that this is a simplified picture: in our implementation that uses CMA- 
ES (it is an evolutionary algorithm), we maintain a population of some ten particles, and 
each of them is moved multiple times (our choice is three times) before the best one is 
chosen as up. 


3.4 Our MAB-Guided Algorithm II: Disjunctive Safety Properties 


The other main algorithm of ours aims to falsify a disjunctive safety property p = 
1(Y1 V p2). We believe this problem setting is even more important than the con- 
junctive case, since it encompasses conditional safety properties (i.e. of the form 
1(Y1 — p2)). See Sect. 3.1 for discussions. 

In the disjunctive setting, the challenge is that falsification of O;y; (with i € {1,2}) 
does not necessarily imply falsification of O7(y1 V p2). This is unlike the conjunctive 
setting. Therefore we need some adaptation of Algorithm 4, so that the two interleaved 
sequences of falsification attempts for yı and ə are not totally independent of each 
other. Our solution consists of restricting time instants to those where (2 is false, in a 
falsification attempt for yı (and vice versa), in the way described in Definition 8. 

Algorithm 5 shows our MAB-guided algorithm for falsifying a disjunctive safety 
property Oz(%1 V p2). The only visible difference is that Line 7 in Algorithm 4 is 
replaced with Line 7’. The new Line 7’ measures the quality of the suggested input 
signal ux in the way restricted to the region Sẹ in which the other formula is already 
falsified. Lemma 9 guarantees that, if rb; < 0, then indeed the input signal ux falsifies 
the original specification O;(y1 V 2). 

The assumption that makes Algorithm 5 sensible is that, although it can be hard 
to find a time instant at which both yı and ¢ are false (this is required in falsifying 
1(Y1 V %2)), falsifying yı (or p2) individually is not hard. Without this assumption, 
the region Sx in Line 7’ would be empty most of the time. Our experiments in Sect. 4 
demonstrate that this assumption is valid in many problem instances, and that Algo- 
rithm 5 is effective. 
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4 Experimental Evaluation 


We name MAB-UCB and MAB-e-greedy the two versions of MAB algorithm using 
strategies ¢-Greedy (see Algorithm 2) and UCB1 (see Algorithm 3). We compared the 
proposed approach (both versions MAB-UCB and MAB-e-greedy) with a state-of-the- 
art falsification framework, namely Breach [11]. Breach encapsulates several hill- 
climbing optimization algorithms, including CMA-ES (covariance matrix adaptation 
evolution strategy) [6], SA (simulated annealing), GNM (global Nelder-Mead) [30], 
etc. According to our experience, CMA-ES outperforms other hill-climbing solvers in 
Breach, so the experiments for both Breach and our approach rely on the CMA-ES 
solver. 

Experiments have been executed using Breach 1.2.13 on an Amazon EC2 c4.large 
instance, 2.9 GHz Intel Xeon E5-2666, 2 virtual CPU cores, 4GB RAM. 


Benchmarks. We selected three benchmark models from the literature, each one hav- 
ing different specifications. The first one is the Automatic Transmission (AT) model 
[16,24]. It has two input signals, throttle € [0,100] and brake € [0,325], and com- 
putes the car’s speed, engine rotation in rounds per minute rpm, and the automatically 
selected gear. The specifications concern the relation between the three output signals 
to check whether the car is subject to some unexpected or unsafe behaviors. The second 
benchmark is the Abstract Fuel Control (AFC) model [16,25]. It takes two input sig- 
nals, pedal angle € [8.8,90] and engine speed € [900,1100], and outputs the critical 
signal air-fuel ratio (AF), which influences fuel efficiency and car performance. The 
value is expected to be close to a reference value A Fref; mu = |AF — AFref| / AFref 
is the deviation of AF from AFref. The specifications check whether this property 
holds under both normal mode and power enrichment mode. The third benchmark is a 
model of a magnetic levitation system with a NARMA-L2 neurocontroller (NN) [7,16]. 
It takes one input signal, Ref € [1,3], which is the reference for the output signal Pos, 
the position of a magnet suspended above an electromagnet. The specifications say that 
the position should approach the reference signal in a few seconds when these two are 
not close. 


Table 2. Benchmark sets Bbench and Sbench 


try t 
(a) Bbench (here ôy (w) represents w‘(t’) — w‘(0)). (b) Sbench 
Bench Specification Parameter Spec ID scaled factor 10% 
ID Formula output 
ATI, 
AT1 Oho, 30)((gear = 3) — (speed > p)) p € {20.6, 20.4, 20.2, 20, 19.8} ATlo 
AT2 Olo,30)((gear = 4) — (speed > p)) p € {43, 41, 39, 37, 35} AT13 speed k €{-2,0,1,3} 
AT3 — Ojo,30)((gear = 4) > (rpm > p)) p € {700, 800, 900, 1000, 1100} ATl4 
AT AT4 Op,30-7)((f10(rpm) > 2000) > (5+ (gear) > 0)) 7 € {15, 16, 17, 18, 19} ATls5 
ATS Ojo,30)((speed < p) A (RPM < 4780)) p € {130, 131, 132, 133, 134, 135, 136, 137} AT54 
AT6 Oio,26)((54 (speed) > p) > (d4(gear) > 0)) p € {20, 25, 30, 35, 40} AT55 
AT7 Oio,30-7]((8- (speed) > 30) > (ô-(gear) > 0)) 7T € {2,3,4,5,6,7,8} AT5e speed k €{-2,0,1,3} 
are APCT Oin so (Controller mode =0) > (mu < p)) p € {0.16,0.17, 0.18, 0.19, 0.2} po 
~_ AFC2 Ohi1,50)((controller-mode = 1) —> (mu < p)) p € {0.222, 0.224, 0.226, 0.228, 0.23} ARS 
close = |Pos — Ref| <=p+a~*|Ref| AFGI 
reach = ©19,2| (Ojo,1)(close)) AFC1; mu k €{0,1,2,3} 
ny NN! po,1si(selose — reach), œ = 0.04 p € {0.001, 0.002, 0.003, 0.004, 0.005} AFC1y 


NNI Ol,18)(—close + reach), œ = 0.03 p € {0.001, 0.002, 0.003, 0.004, 0.005} AFC15 
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We built the benchmark set Bbench, as shown in Table 2a that reports the name of 
the model and its specifications (ID and formula). In total, we found 11 specifications. 
In order to increase the benchmark set and obtain specifications of different complexity, 
we artificially modified a constant (turned into a parameter named 7 if it is contained 
in a time interval, named p otherwise) of the specification: for each specification S, we 
generated m different versions, named as S; with 2 € {1,...,m}; the complexity of 
the specification (in terms of difficulty to falsify it) increases with increasing i.” In total, 
we produced 60 specifications. Column parameter in the table shows which concrete 
values we used for the parameters p and 7. Note that all the specifications but one are 
disjunctive safety properties (i.e., 7(y1 V ye2)), as they are the most difficult case and 
they are the main target of our approach; we just add ATS as example of conjunctive 
safety property (i.e., Or(y1 A y2)). 

Our approach has been proposed with the aim of tackling the scale problem. There- 
fore, to better show how our approach mitigates this problem, we generated a second 
benchmark set Sbench as follows. We selected 15 specifications from Bbench (with 
concrete values for the parameters) and, for each specification S, we changed the cor- 
responding Simulink model by multiplying one of its outputs by a factor 10%, with 
k € {—2,0,1, 2,3} (note that we also include the original one using scale factor 10°); 
the specification has been modified accordingly, by multiplying with the scale factor the 
constants that are compared with the scaled output. We name a specification S' scaled 
with factor 10% as S*, Table 2b reports the IDs of the original specifications, the output 
that has been scaled, and the used scaled factors; in total, the benchmark set Sbench 
contains 60 specifications. 


Experiment. In our context, an experiment consists in the execution of an approach 
A (either Breach, MAB-e-greedy, or MAB-UCB) over a specification S for 30 tri- 
als, using different initial seeds. For each experiment, we record the success SR as the 
number of trials in which a falsifying input was found, and average execution time of 
the trials. Complete experimental results are reported in Appendix A in the extended 
version [37]. We report aggregated results in Table 3. 

For benchmark set Bbench, it reports aggregated results for each group of spec- 
ifications obtained from S (i.e., all the different versions S; obtained by changing the 
value of the parameter); for benchmark set Sbench, instead, results are aggregated 
for each scaled specification S* (considering the versions S* obtained by changing the 
parameter value). We report minimum, maximum and average number of successes SR, 
and time in seconds. For MAB-e-greedy and MAB-UCB, both for SR and time, we 
also report the average percentage difference’ (A) w.r.t. to the corresponding value of 
Breach. 


Comparison. In the following, we compare two approaches Ai, Az € {Breach, 
MAB-e-greedy, MAB-UCB} by comparing the number of their successes SR and 
average execution time using the non-parametric Wilcoxon signed-rank test with 5% 


> Note that we performed this classification based on the falsification results of Breach. 

3 The code, models, and specifications are available online at https://github.com/ 
ERATOMMSD/FalStar-MAB. 

4 A= ((m — b) * 100)/(0.5 x (m + b)) where m is the result of MAB and b the one of Breach. 


Table 3. Aggregated results for benchmark sets 1 
trials. Time in secs. A: percentage difference w.r.t. 1 
lighted, indicated by positive A of SR, and negative A of time. 


Multi-armed Bandits for Boolean Connectives in Hybrid System Falsification 415 


Bbench and Sbench (SR: # successes out 30 
Breach). Outperformance cases are high- 


Spec. Breach MAB-e-greedy MAB-UCB 
ID SR (/30) time (sec.) SR (/30) time (sec.) SR (/30) time (sec.) 

Min Max Avg Min Max Avg Min Max Avg A Min Max Avg A Min Max Avg A Min Max Avg A 
ATI 14 25 20.2) 125 361.2 223.1} 24 3028.6 35.7] 62.7 213.4 106.4 —73.4| 28 3029.2 37.8] 45.1 146.8 77.4 —97.1 
AT2 11 30 20.2 14 390.6 209.8] 30 30 30 43.9) 11.9 126.3 54.5 —96.9| 27 30294 42.2| 17.7 92.5 36.8 —112.1 
AT3 29 3029.4] 2.3 22.2 14.2| 30 30 30 2) 25 7 3.5 82.9} 30 30 30 Ø 25 36 3 738.6 
AT4 18 30 25.8) 19.5 265.3 109.6| 29 30 29.8 16} 7.8 45.1 244 —105| 30 30 30 166) 6.2 36.2 22.2 —113.5 
AT5 6 23 14.1|203.1 525.9 366.2| 26 3028.5 72.1) 35.2 149 93.7 —120.6| 26 3028.2 71.4| 37.7 154.1 99.2 —116.8 
AT6 5 29 22.8] 30.1 509.5 157| 21 30 27 28) 2.3 300 95.1 —98.3) 22 30 27 27.7) 2.9 247.3 86.1 —99.4 
AT7 15 30 26.6] 12.2 314 81.5] 20 30 28.6 8.4) 2.9 283.9 49.9 =92| 23 30 29 10.3) 5.5 223.3 42.9| —883 
AFC1 6 30 14.4|124.8 565.6 413.5] 4 28 12 —28.4) 171 568.4 446 10.8} 5 3016.4 9.7) 98.7 559.8 389.9 —9.3 
AFC2 2 30 18] 80.7 582.3 343.4) 5 30 20 23.8] 43.2 547.8 301.9 —23.8) 5 30 20 22.9) 59.4 568.4 320.5 —11.1 
NN1 17 25 20.8|212.9 384.7 292.9| 14 2720.2 —4.5|189.5 422.8 320.3 6.2| 17 2822.6. 7.3|148.2 403.3 272.3. —11.8 
NN2 27 2827.2| 55.5 93.4 73.1| 30 30 30 9.8) 11 39.3 26.3, —97.8) 30 30 30 9.8] 14.6 38.2 27.4 —92.3 
AT1~?| 30 30 30] 42.5 97.4 56.9] 28 30 29 —3.4| 75.6 178.3 118.7 68.7} 28 3029.4 —2.1) 54.3 136.3 80.3 33.3 
ATI° 14 25 20.2) 125 361.2 223.1} 24 3028.6 35.7] 62.7 213.4 106.4 —73.4| 28 3029.2 37.8) 45.1 146.8 77.4 —97.1 
ATI 4 21 15.4]204.5 527.6 310.2} 25 30 29 68.4) 49 234.7 102.1 —108| 27 2928.2 64.5) 77.5 128.7 105.1 —93 
ATI 8 2419.8] 164 471.7 240.1} 29 3029.8 44.6] 67.5 170.6 101.9 —77.3} 29 3029.4 43.4] 55.4 104.8 80.6 —93.6 
ATS~7| 29 30 29.6} 61.1 163.7 102] 25 3027.8 —6.4| 76.9 139.5 111.9 12.6] 28 3029.4 —0.7| 48.5 131.9 85.7 -17 
ATS° 6 18 11.2}291.1 525.9 423.1] 28 3028.4 90.5] 80.2 151.3 107.4 —117.7) 26 30 28 89.4) 68.3 154.1 114.9 —114.5 
ATS? 0 2 0.4|566.4 600 593.3) 27 3028.4 194.8] 70.7 184.5 110.3 —138.5} 25 3027.6 194.1} 83.1 150 123.7 —131.2 
ATS? 0 1 0.2|586.4 600 597.3| 27 3028.6 197.2| 66.8 163.3 102.5 —142.3| 27 29 28 197.2| 80.4 160.9 111.9 —137.4 
AFC1°| 6 30 14.4]124.8 565.6 413.5| 4 29164 8.5|115.1 559.9 411.1. —2.8] 5 30164 9.7| 98.7 559.8 389.9 —9.3 
AFC1! 7 3016.6] 99 548.2 393.3) 3 29 10.8 —60.9|198.1 587.6 465.8 24.6) 7 2917.8 10.3}105.7 527.3 354.3. —10.3 
AFC1?| 0 12 5.2|434.4 600 535.8] 3 2811.6 96.2/180.8577.6 463 —20.7/ 4 30 17 127| 73.7 556.3 374.5 —47.3 
AFC1?| 1 12 4.8]425.7 587.4 532.6] 3 3014.4 109] 138 585.5 436.5 —28} 7 30 15 113) 77.1 553.4 403.7! —39.9 


level of significance? [35]; the null hypothesis is that there is no difference in applying 
A, Az in terms of the compared measure (SR or time). 


4.1 Evaluation 


We evaluate the proposed approach with some research questions. 


RQI1 Which is the best MAB algorithm for our purpose? 


In Sect. 3.2, we described that the proposed approach can be executed using two 
different strategies for choosing the arm in the MAB problem, namely MAB-e-greedy 


and MAI 
results in Table 3, it seems that MAI 


B-UCI 


B-UCI 


B. We here assess which one is better in terms of SR and time. From the 
B provides slightly better performance in terms 


of SR; this has been confirmed by the Wilcoxon test applied over all the experiments 
(i.e., on the non-aggregated data reported in Appendix A in the extended version [37]): 
the null hypothesis that using anyone of the two strategies has no impact on SR is 
rejected with p-value equal to 0.005089, and the alternative hypothesis that SR is better 
is accepted with p-value = 0.9975; in a similar way, the null hypothesis that there is 
no difference in terms of time is rejected with p-value equal to 3.495e—06, and the 
alternative hypothesis that is MAB-UCB is faster is accepted with p-value = 1. Therefore, 
in the following RQs, we compare Breach with only the MAB-UCB version of our 
approach. 


> We checked that the distributions are not normal with the non-parametric Shapiro-Wilk test. 
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RQ2 Does the proposed approach effectively solve the scale problem? 

We here assess if our approach is effective in tackling the scale problem. Table 4 
reports the complete experimental results over Sbench for Breach and MAB-UCB; 
for each specification S, all its scaled versions are reported in increasing order of the 
scaling factor. We observe that changing the scaling factor affects (sometimes greatly) 
the number of successes SR of Breach; for example, for AT5; and AT57 it goes from 
30 to 0. For MAB-UCB, instead, SR is similar across the scaled versions of each spec- 
ification: this shows that the approach is robust w.r.t. to the scale problem as the “hill- 
climbing gain” reward in Definition 11 eliminates the impact of scaling and UCB1 algo- 
rithm balances the exploration and exploitation of two sub-formulas. The observation is 
confirmed by the Wilcoxon test over SR: the null hypothesis is rejected with p-value = 
1.808e—09, and the alternative hypothesis accepted with p-value = 1. Instead, the null 
hypothesis that there is no difference in terms of time cannot be rejected with p-value = 
0.3294. 


RQ3 How does the proposed process behave with not scaled benchmarks? 

In RQ2, we checked whether the proposed approach is able to tackle the scale 
problem for which it has been designed. Here, instead, we are interested in inves- 
tigating how it behaves on specifications that have not been artificially scaled 
(i.e., those in Bbench). From Table3 (upper part), we observe that MAB-UCB 
is always better than Breach both in terms of SR and time, which is shown 
by the highlighted cases. This is confirmed by Wilcoxon test over SR and time: 
null hypotheses are rejected with p-values equal to, respectively, 6.02e—08 and 
1.41e—08, and the alternative hypotheses that MAB-UCB is better are both accepted 


Table 4. Experimental results — Sbench (SR: # successes out of 30 trials. Time in secs) 


Spec. Breach MAB-UCB Spec. Breach MAB-UCB Spec. Breach MAB-UCB 


ID SR time SR time ID SR time SR time ID SR time SR time 

(/30) (sec.) (/30) (sec.) (/30) (sec.) (/30) (sec.) (/30) (sec.) (/30) (sec.) 
ATIJ? 30 513 30 543| AT57? 30 611 30 485] AFCI9 30 1248 30 987 
ATI? 25 125 29 75] ATS? 18 291.1 28 94.5] AFC1; 30 99 29 105.7 
ATI} 20 221.1 28 107.9 | ATS) 2 566.4 25 150] AFCIZ 12 4344 30 73.7 
ATI} 23 170 29 55.4 | ATS? 1 586.4 28 96.2 |AFC1? 12 4257 30 711 
ATI? 30 49 29 675| A5;? 30 713 29 678/AFCI2 16 4215 23 3468 
AT13 22 187.5 30 45.1 | ATSS 15 369.1 27 114] AFCI} 25 345.9 27 227.9 
ATL) 21 2045 29 77.5 | ATS: 0 600 29 83.1 | AFCIS 8 497.2 25 320.5 
ATI3 24 164 30 61 | ATS 0 600 27 113.8 | AFCI3 5 518.1 21 364 
ATI” 30 425 30 624] AT557 29 110.2 28 103.3 | AFCI? 11 457.7 15 442 
ATI 19 239.5 29 62.5 | ATS? 10 438.2 30 683]AFCI, 13 479.2 14 455.5 
ATL; 16 296.2 27 128.7 | ATS¢ 0 600 27 126.7 | AFC13 2 590.7 15 453.2 
AT15 21 209.8 30 93.4 | AT5% 0 600 29 80.4 | AFC13 5 545.6 8 510.6 
ATIJ? 30 445 30 80.8] AT57? 30 1036 30 773 | AFCI? 9 498.2 9 502.1 
ATI 21 202.2 30 574] ATS? 7 4914 26 154.1 | AFCL} 8 494 12 455 
ATI} 16 301.7 28 119.5 | ATSz 0 600 27 134.3 | AFCIZ 4 556.8 11 468.7 
ATL} 23 185.1 29 88.3 | ATS? 0 600 29 108 | AFC1Ì 1 587.4 9 513.4 
ATI? 30 974 28 136.3] AT5;” 29 163.7 30 131.9 | AFCI2 6 565.6 5 559.8 
ATI? 14 361.2 28 146.8 | ATS, 6 525.9 29 143.6 | AFCI; 7 548.2 7 5273 
ATL: 4 5276 29 91.9 ATS 0 600 30 124.2 AFCI; 0 600 4 556.3 
AT13 8 471.7 29 104.8 | ATS3 0 600 27 160.9 | AFCI? 1 586 7 553.4 
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with p-value = 1. This means that the proposed approach can also handle specifica- 
tions that do not suffer from the scale problem, and so it can be used with any kind of 
specification. 


RQ4 Is the proposed approach more effective than an approach based on rescaling? 

A naive solution to the scale problem could be to rescale the signals used in specifi- 
cation at the same scale. Thanks to the results of RQ2, we can compare to this possible 
baseline approach, using the scaled benchmark set Sbench. For example, AT5 suffers 
from the scale problem as speed is one order of magnitude less than rpm. However, 
from Table 3, we observe that the scaling that would be done by the baseline approach 
(i.e., running Breach over ATS") is not effective, as SR is 0.4/30, that is much lower 
than the original SR 14.1/30 of the unscaled approach using Breach. Our approach, 
instead, raises SR to 28.4/30 and to 27.6/30 using the two proposed versions. By mon- 
itoring Breach execution, we notice that the naive approach fails because it tries to 
falsify rpm < 4780, which, however, is not falsifiable; our approach, instead, under- 
stands that it must try to falsify speed < p. More details are given in the extended 
version [37]. 


5 Conclusion and Future Work 


In this paper, we propose a solution to the scale problem that affects falsification of spec- 
ifications containing Boolean connectives. The approach combines multi-armed bandit 
algorithms with hill climbing-guided falsification. Experiments show that the approach 
is robust under the change of scales, and it outperforms a state-of-the-art falsification 
tool. The approach currently handles binary specifications. As future work, we plan to 
generalize it to complex specifications having more than two Boolean connectives. 
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Abstract. With ever increasing autonomy of cyber-physical systems, 
monitoring becomes an integral part for ensuring the safety of the sys- 
tem at runtime. StreamLAB is a monitoring framework with high degree 
of expressibility and strong correctness guarantees. Specifications are 
written in RTLola, a stream-based specification language with formal 
semantics. StreamLAB provides an extensive analysis of the specifica- 
tion, including the computation of memory consumption and run-time 
guarantees. We demonstrate the applicability of StreamLAB on typical 
monitoring tasks for cyber-physical systems, such as sensor validation 
and system health checks. 


1 Introduction 


In stream-based monitoring, we translate input streams containing data col- 
lected at runtime, such as sensor readings, into output streams containing aggre- 
gate statistics, such as an average value, a counter, or the integral of a signal. 
Trigger specifications define thresholds and other logical conditions on the val- 
ues on these output streams, and raise an alarm or execute some other prede- 
fined action if the condition becomes true. The advantage of this setup is great 
expressiveness and easy-to-reuse, compositional specifications. Existing stream- 
based languages like Lola [9,12] are based on the synchronous programming 
paradigm, where all streams are synchronized via a global clock. In each step, 
the new values of all output streams are computed in terms of the values of 
the other streams at a previous time step or. This paradigm provides a sim- 
ple and natural evaluation model that fits well with typical implementations on 
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synchronous hardware. In real-time applications, however, the assumption that 
all data arrives synchronously is often simply not true. Consider, for example, 
an autonomous drone with several sensors, such as a GPS module, an inertia 
measurement unit, and a laser distance meter. While a synchronous arrival of 
all measured value would be desirable, some sensors’ measurement frequency 
is higher than others. Moreover, the sensors do not necessarily operate on a 
common clock, so their readings drift apart over time. 

In this paper we present the monitoring framework StreamLAB. We lift the 
synchronicity assumption to allow for monitoring asynchronous systems. Basis 
for the framework is RTLola, an extension of the steam-based runtime verifica- 
tion language Lola. RTLola introduces two new key concepts into Lola: 


1. Variable-rate input streams: we consider input streams that extend at a-priori 
unknown rates. The only assumption is that each new event has a real-valued 
timestamp and that the events arrive in-order. 

2. Sliding windows: A sliding window aggregates data over a real-time window 
given in units of time. For example, we might integrate the readings of an 
airspeed indicator. 


As with any semantic extension, the challenge in the design of RTLola is to 
maintain the efficiency of the monitoring. Obviously, not all RTLola specifica- 
tions can be monitored with constant memory since the rates of the input streams 
are unknown, an arbitrary number of events may occur in the span of a fixed 
real-time unit. Thus, for aggregations such as the mean requiring to store the 
whole sequence of value, no amount of constant memory will always suffice. We 
can, nevertheless, again identify an efficiently monitorable fragment that covers 
many specifications of practical interest. For the space-efficient aggregation over 
real-time sliding windows, we partition the real-time axis into equally-sized inter- 
vals. The size of the intervals is dictated by the rate of the output streams. For 
certain common types of aggregations, such as the sum or the number of entries, 
the values within each interval can be pre-aggregated and then only stored in this 
summarized form. In a static analysis of the specification, we identify parts of the 
specification with unbounded memory consumption, and compute bounds for all 
other parts of the specification. In this way, we can determine early, whether a 
particular specification can be executed on a system with limited memory. 


Related Work. There is a rich body of work on monitoring real-time proper- 
ties. Many monitoring approaches are based on real-time variants of temporal 
logics [3,11,16-18,24]. Maler and Nickovic present a monitoring algorithm for 
properties written in signal temporal logic (STL) by reducing STL formulas via 
a boolean abstraction to formulas in the real-time logic MITL [21]. Building 
on these ideas, Donze et al. present an algorithm for the monitoring of STL 
properties over continuous signals [10]. The algorithm computes the robustness 
degree in which a piecewise-continuous signal satisfies or violates an STL for- 
mula. Towards more practical approaches, Basin et al. extend metric logics with 
parameterization [8]. A monitoring algorithm for the extension is implemented 
in the tool MonPoly [5]. MonPoly was introduced as a tool for monitoring usage- 
control policies. Another extension to metric dynamic logic was implemented in 
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Fig. 1. Illustration of the decoupled input and output using aggregations. 


the tool Aerial [7]. However, most monitors generated from temporal logics are 
limited to Boolean verdicts. 

StreamLAB uses the stream-based language RTLola as its core specification 
language. RTLola builds upon Lola [9,12], which is a stream-based language 
originally developed for monitoring synchronous hardware circuits, by adding the 
concepts discussed above. Stream-based monitoring languages are significantly 
more expressive than temporal logics. Other prominent stream-based monitoring 
approaches are the Copilot framework [23] and the tool BeepBeep 3 [15]. Copilot 
is a dataflow language based on several declarative stream processing languages 
[9,14]. From a specification in Copilot, constant space and constant time C 
programs implementing embedded monitors are generated. The BeepBeep 3 tool 
uses an SQL-like language that is defined over streams of events. In addition to 
stream-processing, it contains operators such as slicing, where inputs can be 
separated into several different traces, and windowing where aggregations over a 
sliding window can be computed. Unlike RTLola, BeepBeep and Copilot assume 
a synchronous computation model, where all events arrive at a fixed rate. Two 
asynchronous real-time monitoring approaches are TeSSLa [19] and Striver [13]. 
TeSSLa allows for monitoring piece-wise constant signals where streams can emit 
events at different speeds with arbitrary latencies. Neither language provides 
the language feature of sliding windows and the definition of fixed-rate output 
streams. The efficient evaluation of aggregations on sliding windows [20] has 
previously been studied in the context of temporal logic [4]. Basin et al. present 
an algorithm for combining the elements of subsequences of a sliding window 
with an associative operator, which reuses the results of the subsequences in the 
evaluation of the next window [6]. 


2 Real-Time Lola 


RTLola extends the stream-based specification languages Lola [12] with real-time 
features. In the stream-based processing paradigm, sensor readings are viewed 
as input streams to a stream processing engine that computes outputs in form 
of streams on top of the values of the input streams. For example, the RTLola 
specification 


input altitude : Float32 
output tooLow := altitude < 200.0 
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checks whether a drone flies with an altitude less than 200 feet. For each reading 
of the velocity sensor, a new value for the output stream tooLow is computed. 
Streams marked with the “trigger”-keyword alert the user when the value of 
the trigger is true. In the following example, the user is warned when the drone 
flies below the allowed altitude. 


trigger tooLow "flying below minimum altitude" 


Output streams in RTLola are computed from values of the input streams, other 
output streams and their own past values. If we want to count the number of 
times the drone dives below 200 feet we can specify the stream 


output count := (if tooLow then 1 else 0) 
+ count.offset (by:-1).defaults(to:0) 


Here, the stream count computes its new values by increasing its latest value by 
1 in case the drone currently flies below the permitted altitude. The expres- 
sion count.offset(by:-1) represents the last value of the stream. We call 
such expressions “lookup expressions”. The default operator e.defaults(to:0) 
returns the value 0 in case the value of e is not defined. This can happen when 
a stream is evaluated the first time and looks up its last value. 

In RTLola, we do not impose any assumption on the arrival frequency of 
input streams. Each stream can produce new values individually and at arbi- 
trary points in time. This can lead to problems when a burst of new input values 
occur in a short amount of time. Subsequently, the monitor needs to evaluate 
all output streams, exerting a lot of pressure on the system. To prevent that, 
RT Lola distinguishes between two kinds of outputs. Event-based outputs are 
computed whenever new input values arrive and should thus only contain inex- 
pensive operations. All streams discussed above where event-based. In contrast 
to that, there are periodic outputs such as the following: 


output freqDev @5Hz := altitude.aggregate(over : 200ms, 
using: count) < 5 


Here, freqDev will be evaluated every 200 ms as indicated by the “@ 5 Hz” 
label, independently of arriving input values. The stream freqDev does not access 
the event-based input altitude directly, but uses a sliding window expression to 
count the number of times a new value for altitude occurred within the last 
200 ms. The value of freqDev represents the number of measurements the monitor 
received from the altimeter. Comparing this value against the expected number 
of readings allows for detecting deviations and thus a potentially damaged sensor. 

Sliding windows allow for decoupling event-based and periodic streams, as 
illustrated in Fig. 1. Since the specifier has no control over the frequency of event- 
based streams, these streams should be quickly evaluatable. More expensive 
operations, such as sliding windows, may only be used in periodic streams to 
increase the monitor’s robustness. 
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2.1 Examples 


In the following, we will present several interesting properties showcasing 
RTLola’s expressivity. The specifications are simplified for illustration and thus 
not immediately applicable to the real-world. 


Sensor Validation. When a sensor starts to deteriorate, it can misbehave and 
drop single measurements. To verify that a GPS sensor produces values at its 
specified frequency, in this example 10 Hz, we count the number of sensor values 
in a continuous window and compare it against the expected amount of events 
in this time frame. 
input lat: Float32, lon : Float32 
output gps_freq@10Hz:= 

lat.aggregate(over: =1s, using: count).defaults(to:9) 
trigger gps_freq < 9 "GPS sensor frequency < 9 Hz" 


Assuming that we have another sensor measuring the true air speed, we 
can check whether the measured data matches the GPS data using RTLola’s 
computation primitives. For this, we first compute the difference in longitude 
and latitude between the current and last measurement. The Euclidean distance 
provides the length of the movement vector, which can be derived discretely by 
dividing by the amount of time that has passed between two GPS measurements. 


input velo : Float32 


output dlon := lon - lon.offset(by:-1).defaults(to:lon) 
output dlat := lat - lat.offset(by:-1).defaults(to:lat) 
output gps_dist := sqrt(dlon * dlon + dlat * dlat) 
output gps_velo := gps_dist 


/ (time - time.offset(by:-1).defaults(to:0.0)) 
trigger abs(gps_velo - velo) > 0.1 "Deviating velocity" 


When the pathfinding algorithm of the mission planner takes longer than 
expected, the system remains in a state without target location and thus hovers 
in place. Such a hover period can be detected by computing the covered distance 
in the last seconds. For this, we integrate the assumed velocity. We also exclude 
a strong headwind as a culprit for the low change in position. 


input wnd_dir: Float32, wnd_spd : Float32 


output dir := arctan(lat/lon) 
output headwind := abs(wnd_dir - dir) < 0.2 
A wnd_spd > 10.0 
output hovering @ 1Hz := velo.aggregate(over: 5s, using: Pp 


.defaults(to:0.5) < 0.5 A aheadwind.hold().defaults(to:1) 
trigger hovering "Long hover phase" 
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3 Performance Guarantees via Static Analysis 


3.1 Type System 


RTLola is a strongly-typed specification language. Every expression has two 
orthogonal types: a value type and a stream type. The value type is Bool, String, 
Int, or Float. It indicates the usual semantics of a value or expression and 
the amount of memory required to store the value. The stream type indicates 
when a value is evaluated. For periodic streams, the stream type defines the 
frequency in which it is computed. Event-based streams do not have a pre- 
determined period. The stream type for an event-based stream identifies a set 
of input streams, indicating that the event-based stream is extended whenever 
there is, synchronously, an event on all input streams. Event-based streams may 
also depend on input streams not listed in the type; in such cases, the type 
system requires an explicit use of the 0-order sample&hold operator. 

The type system provides runtime guarantees for the monitor: Independently 
of the arrival of input data, it is guaranteed that all required data is available 
whenever a stream is extended. Either the data was just received as input event, 
was computed as output stream value, or the specifier provided a default value. 
The type system can, thus, eliminate classes of specification problems like unin- 
tentionally accessing a slower stream from a faster stream. Whenever possible, 
the tool provides automatic type inference. 


3.2 Sliding Windows 


We use two techniques to ensure that we only need a bounded amount of memory 
to compute sliding windows. Meertens [22] classifies an aggregations y: A* — B 
as list homomorphism if it can be split into a mapping function m: A — T, an 
associative reduction function r: T x T — T, a finalization function f: T —> B, 
and a neutral element € € T with Vt € T: r(t,e) = r(e,t) = t. For these 
functions, rather than aggregating the whole list at once, one can apply m to 
each element, reduce the intermediate results with an arbitrary precedence, and 
finalize the result to get the same value. The second technique by Li et al. [20] 
divides a time interval into panes of equal size. For each pane, we aggregate all 
inputs and store only the fix amount of intermediate values. The type system 
ensures that sliding windows only occur in periodic streams so by choosing the 
pane size as the inverse of the frequency, paning does not change the result. In 
StreamLAB there are several pre-defined aggregation functions such as count, 
integration, summation, product, mini-, and maximization available. 


3.3 Memory Analysis 


StreamLAB computes the worst-case memory consumption of the specification. 
For this, an annotated dependency graph (ADG) is constructed where each 
stream s constitutes a node vs and whenever s accesses s’, there is an edge 
from vs to vg. Edges are annotated according to the type of access: if s accesses 
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s’ discretely with offset n or with a sliding window aggregation of duration d 
and aggregation function y, then the edge e = (vs, vs’) is labeled with A(e) = n 
or A(e) = (d, y), respectively. Nodes of periodic streams are now annotated with 
their periodicity, if stream s has period 200ms then the node is labeled with 
m(vs) = 5Hz. Memory bounds for discrete-time offsets can be computed as for 
Lola [9]. We extend this algorithm with new computational rules to determine 
the memory bounds for real-time expressions. For each edge e = (v,v’) in the 
ADG we can determine how many events of v’ must be stored for the computa- 
tion of v using the rules in Fig. 2. Here, only y is a list homomorphism. The strict 
upper bound on required memory is now the sum of the memory requirement 
of each individual stream. This, however, is only the amount of memory needed 
for storing values and does not take book-keeping data structures and the inter- 
nal representation of the specification into account. Assuming reasonably small 
expressions (depth < 64), this additional memory can be bounded with 1kB per 
stream plus a flat 10kB for working memory. 


Time Update 


m(v)|m(v')||A(e) = (dv) |A(e) = (4,7) 
var | var || unbounded zd 
xHz| var || unbounded | min(zd, zd) 


var | yHz yd min(zd, yd) Working 
xHz| yHz || min(zd, yd) | min(ad, yd) ae 


Fig. 2. Computation of memory bound Fig. 3. Illustration of the data flow. The 

over the dependency graph. EM manages input events, TM schedules 
periodic tasks, and Eval manages the eval- 
uation of streams. 


input 


4 Processing Engine 


The processing engine consists of three components: The EventManager (EM) 
reads events from an input such Standard In or a CSV file and translates string 
values into the internal representation. The values are mapped to the correspond- 
ing input streams in the specification. Using a multiple-sender-single-receiver 
channel, the EM pushes the event on a working queue. The TimeManager (TM) 
schedules the evaluation of periodic streams. The TM computes the hyper-period 
of all streams and groups them by equal deadlines. Whenever a deadline is 
due, the corresponding streams are pushed into the working queue using the 
same channel as the EM. This ensures that event-based and periodic evaluation 
cycles occur in the correct order even under high pressure. Lastly, the Evaluator 
(Eval) manages the evaluation of streams and storage of computed values. The 
Eval repeatedly pops items off the working queue and evaluates the respective 
streams. 
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When monitoring a system online, the TM uses the internal system clock for 
scheduling tasks. When monitoring offline, however, this is no longer possible 
because the point in time when a stream is due to be evaluated depends on 
the input event. Thus, before the EM pushes an event on the working queue, it 
transmits the latest timestamp to the TM. The TM then decides whether some 
periodic streams need to be evaluated. If so, it effectively goes back in time 
by pushing the respective task on the working queue before acknowledging the 
TM. Only upon receiving the acknowledgement, the TM sends the event to the 
working queue. Figure 3 illustrates the information flow between the components. 


5 Experiments 


StreamLAB! is implemented in Rust. A major benefit of a Rust implementation 
is the connection to LLVM, which allows a compilation to a large variety of 
platforms. Moreover, the requirements to the runtime environment are as low as 
for C programs. This allows StreamLAB to be widely applicable. 

The specifications presented in Sect. 2.1 have been tested on traces generated 
with the state-of-the-art flight simulator ARDUPILOT?. Each trace is the result 
of a drone flying one or more round-trips over Saarland University and provides 
sensor information for longitude and latitude, true air velocity, wind direction 
and speed, as well as the number of available GPS satellites. The longest trace 
consists of slightly less than 433,000 events. StreamLAB successfully detected 
a variety of errors such as delayed sensor readings, GPS module failures, and 
phases without significant movement. For an online runtime verification, the 
monitor reads an event of the simulator’s output, processes the input data and 
pauses until the next event is available. Whenever necessary, periodic streams 
are evaluated. Online monitoring of a simulation did not allow us to exhaust the 
capabilities of StreamLAB because the generation of events took significantly 
longer than processing them. The offline monitoring function of StreamLAB 
allows the user to specify a delay in which consecutive events are read from a 
file. By gradually decreasing the delay between events until the pressure was 
too high, we could determine a maximum input frequency of 647.2kHz. When 
disabling the delay and running the monitor at maximum speed, StreamLAB 
processes a trace of length 432,961 in 0.67s, so each event takes 1545ns to 
process while three threads utilized 146% of CPU. In terms of memory, the 
maximum resident set size amounted to 16 MB. This includes bookkeeping data 
structures, the specification, evaluator code, and parts of the C standard library. 
While the evaluation does not require any heap allocation after the setup phase, 
the average stack size amounts to less than 1kB. The experiment was conducted 
on 3.3 GHz Intel Core i7 processor with 16 GB2133 MHz LPDDR3 RAM. 


1 www.stream-lab.org. 
? ardupilot.org. 
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Outlook 


The stream-based monitoring framework StreamLAB demonstrates the appli- 
cability of stream monitoring for cyber-physical systems. Previous versions of 
Lola have successfully been applied to networks and unmanned aircraft systems 
in cooperation the with German Aerospace Center DLR [1,2,12]. StreamLAB 
provides a modular, easy-to-understand specification language and design-time 
feedback for specifiers. This helps to improve the development process for cyber- 
physical systems. Coupled with the promising experimental results, this lays the 
foundation for further applications of the framework on real-world systems. 
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Abstract. We present VERIFAI, a software toolkit for the formal design and 
analysis of systems that include artificial intelligence (AI) and machine learning 
(ML) components. VERIFAI particularly addresses challenges with applying for- 
mal methods to ML components such as perception systems based on deep neural 
networks, as well as systems containing them, and to model and analyze system 
behavior in the presence of environment uncertainty. We describe the initial ver- 
sion of VERIFAI, which centers on simulation-based verification and synthesis, 
guided by formal models and specifications. We give examples of several use 
cases, including temporal-logic falsification, model-based systematic fuzz test- 
ing, parameter synthesis, counterexample analysis, and data set augmentation. 


Keywords: Formal methods - Falsification - Simulation - 
Cyber-physical systems - Machine learning - Artificial intelligence - 
Autonomous vehicles 


1 Introduction 


The increasing use of artificial intelligence (AI) and machine learning (ML) in systems, 
including safety-critical systems, has brought with it a pressing need for formal meth- 
ods and tools for their design and verification. However, AI/ML-based systems, such as 
autonomous vehicles, have certain characteristics that make the application of formal 
methods very challenging. We mention three key challenges here; see Seshia et al. [23] 
for an in-depth discussion. First, several uses of AI/ML are for perception, the use of 
computational systems to mimic human perceptual tasks such as object recognition and 
classification, conversing in natural language, etc. For such perception components, 
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writing a formal specification is extremely difficult, if not impossible. Additionally, the 
signals processed by such components can be very high-dimensional, such as streams 
of images or LiDAR data. Second, machine learning being a dominant paradigm in 
AI, formal tools must be compatible with the data-driven design flow for ML and also 
be able to handle the complex, high-dimensional structures in ML components such as 
deep neural networks. Third, the environments in which AI/ML-based systems oper- 
ate can be very complex, with considerable uncertainty even about how many (which) 
agents are in the environment (both human and robotic), let alone about their intentions 
and behaviors. As an example, consider the difficulty in modeling urban traffic envi- 
ronments in which an autonomous car must operate. Indeed, AI/ML is often introduced 
into these systems precisely to deal with such complexity and uncertainty! From a for- 
mal methods perspective, this makes it very hard to create realistic environment models 
with respect to which one can perform verification or synthesis. 

In this paper, we introduce the VERIFAI toolkit, our initial attempt to address 
the three core challenges—perception, learning, and environments—that are outlined 
above. VERIFAI takes the following approach: 


e Perception: A perception component maps a concrete feature space (e.g. pixels) to 
an output such as a classification, prediction, or state estimate. To deal with the lack 
of specification for perception components, VERIFAI analyzes them in the context 
of a closed-loop system using a system-level specification. Moreover, to scale to 
complex high-dimensional feature spaces, VERIFAI operates on an abstract feature 
space (or semantic feature space) [10] that describes semantic aspects of the envi- 
ronment being perceived, not the raw features such as pixels. 

e Learning: VERIFAI aims to not only analyze the behavior of ML components but 
also use formal methods for their (re-)design. To this end, it provides features to 
(i) design the data set for training and testing [9], (ii) analyze counterexamples to 
gain insight into mistakes by the ML model, as well as (iii) synthesize parameters, 
including hyper-parameters for training algorithms and ML model parameters. 

e Environment Modeling: Since it can be difficult, if not impossible, to exhaus- 
tively model the environments of AlI-based systems, VERIFAI aims to provide 
ways to capture a designer’s assumptions about the environment, including distri- 
bution assumptions made by ML components, and to describe the abstract feature 
space in an intuitive, declarative manner. To this end, VERIFAI provides users with 
SCENIC [12, 13], a probabilistic programming language for modeling environments. 
SCENIC, combined with a renderer or simulator for generating sensor data, can pro- 
duce semantically-consistent input for perception components. 


VERIFAL is currently focused on AI-based cyber-physical systems (CPS), although 
its basic ideas can also be applied to other AI-based systems. As a pragmatic choice, we 
focus on simulation-based verification, where the simulator is treated as a black-box, 
so as to be broadly applicable to the range of simulators used in industry.' The input to 


' Our work is complementary to the work on industrial-grade simulators for AI/ML-based CPS. 
In particular, VERIFAI enhances such simulators by providing formal methods for modeling 
(via the SCENIC language), analysis (via temporal logic falsification), and parameter synthesis 
(via property-directed hyper/model-parameter synthesis). 
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VERIFAI is a “closed-loop” CPS model, comprising a composition of the AI-based CPS 
system under verification with an environment model, and a property on the closed-loop 
model. The AI-based CPS typically comprises a perception component (not necessar- 
ily based on ML), a planner/controller, and the plant (i.e., the system under control). 
Given these, VERIFAI offers the following use cases: (1) temporal-logic falsification; 
(2) model-based fuzz testing; (3) counterexample-guided data augmentation; (4) coun- 
terexample (error table) analysis; (5) hyper-parameter synthesis, and (6) model param- 
eter synthesis. The novelty of VERIFAT is that it is the first tool to offer this suite of use 
cases in an integrated fashion, unified by a common representation of an abstract feature 
space, with an accompanying modeling language and search algorithms over this fea- 
ture space, all provided in a modular implementation. The algorithms and formalisms 
in VERIFAI are presented in papers published by the authors in other venues (e.g., [7- 
10, 12, 15,22]). The problem of temporal-logic falsification or simulation-based verifi- 
cation of CPS models is well studied and several tools exist (e.g. [3, 11]); our work was 
the first to extend these techniques to CPS models with ML components [7,8]. Work 
on verification of ML components, especially neural networks (e.g., [14,26]), is com- 
plementary to the system-level analysis performed by VERIFAI. Fuzz testing based on 
formal models is common in software engineering (e.g. [16]) but our work is unique in 
the CPS context. Similarly, property-directed parameter synthesis has also been studied 
in the formal methods/CPS community, but our work is the first to apply these ideas to 
the synthesis of hyper-parameters for ML training and ML model parameters. Finally, 
to our knowledge, our work on augmenting training/test data sets [9], implemented in 
VERIFAI, is the first use of formal techniques for this purpose. In Sect. 2, we describe 
how the tool is structured so as to provide the above features. Sect. 3 illustrates the use 
cases via examples from the domain of autonomous driving. 


2 VERIFAI Structure and Operation 


VERIFAI is currently focused on simulation-based analysis and design of AI compo- 
nents for perception or control, potentially those using ML, in the context of a closed- 
loop cyber-physical system. Figure 1 depicts the structure and operation of the toolkit. 


Inputs and Outputs: Using VERIFAI requires setting up a simulator for the domain 
of interest. As we explain in Sect. 3, we have experimented with multiple robotics 
simulators and provide an easy interface to connect a new simulator. The user then con- 
structs the inputs to VERIFAT, including (i) a simulatable model of the system, including 
code for one or more controllers and perception components, and a dynamical model 
of the system being controlled; (ii) a probabilistic model of the environment, specifying 
constraints on the workspace, the locations of agents and objects, and the dynamical 
behavior of agents, and (iii) a property over the composition of the system and its envi- 
ronment. VERIFAI is implemented in Python for interoperability with ML/AI libraries 
and simulators across platforms. The code for the controller and perception component 
can be arbitrary executable code, invoked by the simulator. The environment model 
typically comprises a definition in the simulator of the different types of agents, plus a 
description of their initial conditions and other parameters using the SCENIC probabilis- 
tic programming language [12]. Finally, the property to be checked can be expressed 
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Fig. 1. Structure and operation of VERIFAI. 


using Metric Temporal Logic (MTL) [2,24], objective functions, or arbitrary code mon- 
itoring the property. The output of VERIFAI depends on the feature being invoked. For 
falsification, VERIFAI returns one or more counterexamples, simulation traces violat- 
ing the property [7]. For fuzz testing, VERIFAI produces traces sampled from the dis- 
tribution of behaviors induced by the probabilistic environment model [12]. Error table 
analysis involves collecting counterexamples generated by the falsifier into a table, on 
which we perform analysis to identify features that are correlated with property failures. 
Data augmentation uses falsification and error table analysis to generate additional data 
for training and testing an ML component [9]. Finally, the property-driven synthesis of 
model parameters or hyper-parameters generates as output a parameter evaluation that 
satisfies the specified property. 


Tool Structure: VERIFAI is composed of four main modules, as described below: 


e Abstract Feature Space and SCENIC Modeling Language: The abstract feature space 
is a compact representation of the possible configurations of the simulation. Abstract 
features can represent parameters of the environment, controllers, or of ML compo- 
nents. For example, when analyzing a visual perception system for an autonomous 
car, an abstract feature space could consist of the initial poses and types of all vehi- 
cles on the road. Note that this abstract space, compared to the concrete feature space 
of pixels used as input to the controller, is better suited to the analysis of the overall 
closed-loop system (e.g. finding conditions under which the car might crash). 


VERIFAI provides two ways to construct abstract feature spaces. They can be con- 
structed hierarchically, combining basic domains such as hyperboxes and finite sets 
into structures and arrays. For example, we could define a space for a car as a struc- 
ture combining a 2D box for position with a 1D box for heading, and then create an 
array of these to get a space for several cars. Alternatively, VERIFAI allows a feature 
space to be defined using a program in the SCENIC language [12]. SCENIC provides 
convenient syntax for describing geometric configurations and agent parameters, 
and, as a probabilistic programming language, allows placing a distribution over the 
feature space which can be conditioned by declarative constraints. 
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e Searching the Feature Space: Once the abstract feature space is defined, the next 
step is to search that space to find simulations that violate the property or pro- 
duce other interesting behaviors. Currently, VERIFAI uses a suite of sampling meth- 
ods (both active and passive) for this purpose, but in the future we expect to also 
integrate directed or exhaustive search methods including those from the adver- 
sarial machine learning literature (e.g., see [10]). Passive samplers, which do not 
use any feedback from the simulation, include uniform random sampling, simu- 
lated annealing, and Halton sequences [18] (quasi-random deterministic sequences 
with low-discrepancy guarantees we found effective for falsification [7]). Distribu- 
tions defined using SCENIC are also passive in this sense. Active samplers, whose 
selection of samples is informed by feedback from previous simulations, include 
cross-entropy sampling and Bayesian optimization. The former selects samples and 
updates the prior distribution by minimizing cross-entropy; the latter updates the 
prior from the posterior over a user-provided objective function, e.g. the satisfaction 
level of a specification or the loss of an analyzed model. 

e Property Monitor: Trajectories generated by the simulator are 
evaluated by the monitor, which produces a score for a given property or 
objective function. VERIFAI supports monitoring MTL properties using the 
py-metric-temporal-logic [24] package, including both the Boolean and 
quantitative semantics of MTL. As mentioned above, the user can also specify a cus- 
tom monitor as a Python function. The result of the monitor can be used to output 
falsifying traces and also as feedback to the search procedure to direct the sampling 
(search) towards falsifying scenarios. 

e Error Table Analysis: Counterexamples are stored in a data structure called the error 
table, whose rows are counterexamples and columns are abstract features. The error 
table can be used offline to debug (explain) the generated counterexamples or online 
to drive the sampler towards particular areas of the abstract feature space. VERIFAI 
provides different techniques for error table analysis depending on the end use (e.g., 
counter-example analysis or data set augmentation), including principal component 
analysis (PCA) for ordered feature domains and subsets of the most recurrent values 
for unordered domains (see [9] for further details). 


The communication between VERIFAI and the simulator is implemented in a client- 
server fashion using IPv4 sockets, where VERIFAI sends configurations to the simulator 
which then returns trajectories (traces). This architecture allows easy interfacing to a 
simulator and even with multiple simulators at the same time. 


3 Features and Case Studies 


This section illustrates the main features of VERIFAI through case studies demonstrat- 
ing its various use cases and simulator interfaces. Specifically, we demonstrate model 
falsification and fuzz testing of an autonomous vehicle (AV) controller, data augmenta- 
tion and error table analysis for a convolutional neural network, and model and hyper- 
parameter tuning for a reinforcement learning-based controller. 
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3.1 Falsification and Fuzz Testing 


VERIFAI offers a convenient way to debug systems through systematic testing. Given 
a model and a specification, the tool can use active sampling to automatically search 
for inputs driving the model towards a violation of the specification. VERIFAI can also 
perform model-based fuzz testing, exploring random variations of a scenario guided 
by formal constraints. To demonstrate falsification and fuzz testing, we consider two 
scenarios involving AVs simulated with the robotics simulator Webots [25]. For the 
experiments reported here, we used Webots 2018 which is commercial software. 

In the first example, we falsify the controller of an AV which is responsible for 
safely maneuvering around a disabled car and traffic cones which are blocking the 
road. We implemented a hybrid controller which relies on perception modules for 
state estimation. Initially, the car follows its lane using standard computer vision (non- 
ML) techniques for line detection [20]. At the same time, a neural network (based on 
squeezeDet [27]) estimates the distance to the cones. When the distance drops below 
15 m, the car performs a lane change, afterward switching back to lane-following. 

The correctness of the AV is characterized by an MTL formula requiring the vehi- 
cle to maintain a minimum distance from the traffic cones and avoid overshoot while 
changing lanes. The task of the falsifier is to find small perturbations of the initial scene 
(generated by SCENIC) which cause the vehicle to violate this specification. We allowed 
perturbations of the initial positions and orientations of all objects, the color of the dis- 
abled car, and the cruising speed and reaction time of the ego car. 

Our experiments showed that active samplers driven by the robustness of the MTL 
specification can efficiently discover scenes that confuse the controller and yield faulty 
behavior. Figure 2 shows an example, where the neural network detected the orange car 
instead of the traffic cones, causing the lane change to be initiated too early. As a result, 
the controller performed only an incomplete lane change, leading to a crash. 


Fig. 2. A falsifying scene automatically discovered by VERIFAI. The neural network misclassifies 
the traffic cones because of the orange vehicle in the background, leading to a crash. Left: bird’s- 
eye view. Right: dash-cam view, as processed by the neural network. 


In our second experiment, we used VERIFAI to simulate variations on an actual 
accident involving an AV [5]. The AV, proceeding straight through an intersection, was 
hit by a human turning left. Neither car was able to see the other because of two lanes of 
stopped traffic. Figure 3 shows a (simplified) SCENIC program we wrote to reproduce 


438 T. Dreossi et al. 


# Car going straight 
ego = Car on egoLane.median 


# Car turning left 
Car on leftTurnLane.median 


# A car blocking the Ego's view 

spot = OrientedPoint on blockLane.median 
laneNoise = (-0.5, 0.5) 

Car at spot offset by laneNoise @ @ 


# Another car 5-8 m behind that 
Car at spot2 offset by laneNoise @ (-5, -8) 


Fig. 3. Left: Partial SCENIC program for the crash scenario. Car is an object class defined in the 
Webots world model (not shown), on is a SCENIC specifier positioning the object uniformly at 
random in the given region (e.g. the median line of a lane), (-0.5, 0.5) indicates a uniform 
distribution over that interval, and X @ Y creates a vector with the given coordinates (see [12] 
for a complete description of SCENIC syntax). Right: (1) initial scene sampled from the program; 
(2) the red car begins its turn, unable to see the green car; (3) the resulting collision. (Color figure 
online) 


the accident, allowing variation in the initial positions of the cars. We then ran simu- 
lations from random initial conditions sampled from the program, with the turning car 
using a controller trying to follow the ideal left-turn trajectory computed from Open- 
StreetMap data using the Intelligent Intersections Toolbox [17]. The car going straight 
used a controller which either maintained a constant velocity or began emergency break- 
ing in response to a message from a simulated “smart intersection” warning about the 
turning car. By sampling variations on the initial conditions, we could determine how 
much advance notice is necessary for such a system to robustly avoid an accident. 


3.2 Data Augmentation and Error Table Analysis 


Data augmentation is the process of 
supplementing training sets with the 
goal of improving the performance 
of ML models. Typically, datasets 
are augmented with transformed ver- 


sions of preexisting training examples. Fig. 4. This image generated by our renderer was 
In [9], we showed that augmentation misclassified by the NN. The network reported 
with counterexamples is also an effec- detecting only one car when there were two. 

tive method for model improvement. 


VERIFAI implements a counterexample-guided augmentation scheme, where a fal- 
sifier (see Sect. 3.1) generates misclassified data points that are then used to augment the 
original training set. The user can choose among different sampling methods, with pas- 
sive samplers suited to generating diverse sets of data points while active samplers can 
efficiently generate similar counterexamples. In addition to the counterexamples them- 
selves, VERIFAI also returns an error table aggregating information on the misclassifi- 
cations that can be used to drive the retraining process. Figure 4 shows the rendering of 
a misclassified sample generated by our falsifier. 
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For our experiments, we implemented a renderer that generates images of road sce- 
narios and tested the quality of our augmentation scheme on the squeezeDet convolu- 
tional neural network [27], trained for classification. We adopted three techniques to 
select augmentation images: (1) randomly sampling from the error table, (2) selecting 
the top k-closest (similar) samples from the error table, and (3) using PCA analysis to 
generate new samples. For details on the renderer and the results of counterexample- 
driven augmentation, see [9]. We show that incorporating the generated counterexam- 
ples during re-training improves the accuracy of the network. 


3.3 Model Robustness and Hyperparameter Tuning 


In this final section, we demonstrate how VERIFAI can be used to tune test parameters 
and hyperparameters of AI systems. For the following case studies, we use OpenAI 
Gym [4], a framework for experimenting with reinforcement learning algorithms. 

First, we consider the problem of testing the robustness of a learned controller for 
a cart-pole, i.e., a cart that balances an inverted pendulum. We trained a neural net- 
work to control the cart-pole using Proximal Policy Optimization algorithms [21] with 
100k training episodes. We then used VERIFAT to test the robustness of the learned 
controller, varying the initial lateral position and rotation of the cart as well as the mass 
and length of the pole. Even for apparently robust controllers, VERIFAI was able to 
discover configurations for which the cart-pole failed to self-balance. Figure 5 shows 
1000 iterations of the falsifier, where sampling was guided by the reward function used 
by OpenATI to train the controller. This function provides a negative reward if the cart 
moves more than 2.4 m or if at any time the angle maintained by the pole is greater than 
12°. For testing, we slightly modified these thresholds. 


ass of pole 


Fig. 5. The green dots represent model parameters for which the cart-pole controller behaved 
correctly, while the red dots indicate specification violations. Out of 1000 randomly-sampled 
model parameters, the controller failed to satisfy the specification 38 times. (Color figure online) 


Finally, we used VERIFAI to study the effects of hyperparameters when training a 
neural network controller for a mountain car. In this case, the controller must learn to 
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exploit momentum in order to climb a steep hill. Here, rather than searching for coun- 
terexamples, we look for a set of hyperparameters under which the network correctly 
learns to control the car. Specifically, we explored the effects of using different training 
algorithms (from a discrete set of choices) and the size of the training set. We used the 
VERIFAI falsifier to search the hyperparameter space, guided again by the reward func- 
tion provided by OpenAI Gym (here the distance from the goal position), but negated 
so that falsification implied finding a controller which successfully climbs the hill. In 
this way VERIFAI built a table of safe hyperparameters. PCA analysis then revealed 
which hyperparameters the training process is most sensitive or robust to. 


4 Conclusion 


We presented VERIFAI, a toolkit for the formal design and analysis of AI/ML-based 
systems. Our implementation, plus the examples described in Sect. 3, are available in 
the tool distribution [1], including detailed instructions and expected output. 

In future work, we plan to explore additional applications of VERIFAI, and to 
expand its functionality with new algorithms. Towards the former, we have already 
interfaced VERIFAI to the CARLA driving simulator [6], for more sophisticated exper- 
iments with autonomous cars, as well as to the X-Plane flight simulator [19], for testing 
an ML-based aircraft navigation system. More broadly, although our focus has been 
on CPS, we note that VERIFAI’s architecture is applicable to other types of systems. 
Finally, for extending VERIFAI itself, we plan to move beyond directed simulation by 
incorporating symbolic methods, such as those used in finding adversarial examples. 
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Abstract. Deep neural networks are revolutionizing the way complex 
systems are designed. Consequently, there is a pressing need for tools and 
techniques for network analysis and certification. To help in addressing 
that need, we present Marabou, a framework for verifying deep neural 
networks. Marabou is an SMT-based tool that can answer queries about 
a network’s properties by transforming these queries into constraint sat- 
isfaction problems. It can accommodate networks with different activa- 
tion functions and topologies, and it performs high-level reasoning on the 
network that can curtail the search space and improve performance. It 
also supports parallel execution to further enhance scalability. Marabou 
accepts multiple input formats, including protocol buffer files generated 
by the popular TensorFlow framework for neural networks. We describe 
the system architecture and main components, evaluate the technique 
and discuss ongoing work. 


1 Introduction 


Recent years have brought about a major change in the way complex systems are 
being developed. Instead of spending long hours hand-crafting complex software, 
many engineers now opt to use deep neural networks (DNNs) [6,19]. DNNs are 
machine learning models, created by training algorithms that generalize from a 
finite set of examples to previously unseen inputs. Their performance can often 
surpass that of manually created software as demonstrated in fields such as image 
classification [16], speech recognition [8], and game playing [21]. 

Despite their overall success, the opacity of DNNs is a cause for concern, 
and there is an urgent need for certification procedures that can provide rig- 
orous guarantees about network behavior. The formal methods community has 
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taken initial steps in this direction, by developing algorithms and tools for neural 
network verification [5,9,10,12,18, 20, 23,24]. A DNN verification query consists 
of two parts: (i) a neural network, and (ii) a property to be checked; and its 
result is either a formal guarantee that the network satisfies the property, or a 
concrete input for which the property is violated (a counter-example). A verifica- 
tion query can encode the fact, e.g., that a network is robust to small adversarial 
perturbations in its input [22]. 

A neural network is comprised of neurons, organized in layers. The network 
is evaluated by assigning values to the neurons in the input layer, and then using 
these values to iteratively compute the assignments of neurons in each succeeding 
layer. Finally, the values of neurons in the last layer are computed, and this is the 
network’s output. A neuron’s assignment is determined by computing a weighted 
sum of the assignments of neurons from the preceding layer, and then applying 
to the result a non-linear activation function, such as the Rectified Linear Unit 
(ReLU) function, ReLU(«) = max (0, x). Thus, a network can be regarded as a 
set of linear constraints (the weighted sums), and a set of non-linear constraints 
(the activation functions). In addition to a neural network, a verification query 
includes a property to be checked, which is given in the form of linear or non- 
linear constraints on the network’s inputs and outputs. The verification problem 
thus reduces to finding an assignment of neuron values that satisfies all the 
constraints simultaneously, or determining that no such assignment exists. 

This paper presents a new tool for DNN verification and analysis, called 
Marabou. The Marabou project builds upon our previous work on the Reluplex 
project [2,7,12,13,15,17], which focused on applying SMT-based techniques to 
the verification of DNNs. Marabou follows the Reluplex spirit in that it applies an 
SMT-based, lazy search technique: it iteratively searches for an assignment that 
satisfies all given constraints, but treats the non-linear constraints lazily in the 
hope that many of them will prove irrelevant to the property under consideration, 
and will not need to be addressed at all. In addition to search, Marabou performs 
deduction aimed at learning new facts about the non-linear constraints in order 
to simplify them. 

The Marabou framework is a significant improvement over its predecessor, 
Reluplex. Specifically, it includes the following enhancements and modifications: 


— Native support for fully connected and convolutional DNNs with arbitrary 
piecewise-linear activation functions. This extends the Reluplex algorithm, 
which was originally designed to support only ReLU activation functions. 

— Built-in support for a divide-and-conquer solving mode, in which the solver is 
run with an initial (small) timeout. If the timeout is reached, the solver par- 
titions its input query into simpler sub-queries, increases the timeout value, 
and repeats the process on each sub-query. This mode naturally lends itself 
to parallel execution by running sub-queries on separate nodes; however, it 
can yield significant speed-ups even when used with a single node. 

— A complete simplex-based linear programming core that replaces the exter- 
nal solver (GLPK) that was previously used in Reluplex. The new simplex 
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core was tailored for a smooth integration with the Marabou framework and 
eliminates much of the overhead in Reluplex due to the use of GLPK. 

— Multiple interfaces for feeding queries into the solver. A query’s neural net- 
work can be provided in a textual format or as a protocol buffer (protobuf ) 
file containing a TensorFlow model; and the property can be either compiled 
into the solver, provided in Python, or stored in a textual format. We expect 
these interfaces will simplify usage of the tool for many users. 

— Support for network-level reasoning and deduction. The earlier Reluplex tool 
performed deductions at the level of single constraints, ignoring the input 
network’s topology. In Marabou, we retain this functionality but also include 
support for reasoning based on the network topology, such as symbolic bound 
tightening [23]. This allows for efficient curtailment of the search space. 


Marabou is available online [14] under the permissive modified BSD license. 


MARABOU 
Search: 
ie ce ~ 
A ETER Piecewise-Linear Constraints ea 
Query | P ( 
Deduction: DN 


Ra mi Constraint-Level Reasoning UNSAT 
PASUES Network-Level Reasoning 


Fig. 1. The main components of Marabou. 


2 Design of Marabou 


Marabou regards each neuron in the network as a variable and searches for a 
variable assignment that simultaneously satisfies the query’s linear constraints 
and non-linear constraints. At any given point, Marabou maintains the current 
variable assignment, lower and upper bounds for every variable, and the set of 
current constraints. In each iteration, it then changes the variable assignment 
in order to (1) correct a violated linear constraint, or (2) correct a violated 
non-linear constraint. 

The Marabou verification procedure is sound and complete, i.e. the afore- 
mentioned loop eventually terminates. This can be shown via a straightforward 
extension of the soundness and completeness proof for Reluplex [12]. However, 
in order to guarantee termination, Marabou only supports activation functions 
that are piecewise-linear. The tool already has built-in support for the ReLU 
function and the Max function max (z1,..., n), and it is modular in the sense 
that additional piecewise-linear functions can be added easily. 
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Another important aspect of Marabou’s verification strategy is deduction— 
specifically, the derivation of tighter lower and upper variable bounds. The moti- 
vation is that such bounds may transform piecewise-linear constraints into lin- 
ear constraints, by restricting them to one of their linear segments. To achieve 
this, Marabou repeatedly examines linear and non-linear constraints, and also 
performs network-level reasoning, with the goal of discovering tighter variable 
bounds. 

Next, we describe Marabou’s main components (see also Fig. 1). 


2.1 Simplex Core (Tableau and BasisFactorization Classes) 


The simplex core is the part of the system responsible for making the variable 
assignment satisfy the linear constraints. It does so by implementing a variant 
of the simplex algorithm [3]. In each iteration, it changes the assignment of some 
variable x, and consequently the assignment of any variable y that is connected 
to x by a linear equation. Selecting x and determining its new assignment is 
performed using standard algorithms—specifically, the revised simplex method 
in which the various linear constraints are kept in implicit matrix form, and the 
steepest-edge and Harris’ ratio test strategies for variable selection. 

Creating an efficient simplex solver is complicated. In Reluplex, we delegated 
the linear constraints to an external solver, GLPK. Our motivation for imple- 
menting a new custom solver in Marabou was twofold: first, we observed in 
Reluplex that the repeated translation of queries into GLPK and extraction of 
results from GLPK was a limiting factor on performance; and second, a black 
box simplex solver did not afford the flexibility we needed in the context of DNN 
verification. For example, in a standard simplex solver, variable assignments are 
typically pressed against their upper or lower bounds, whereas in the context of 
a DNN, other assignments might be needed to satisfy the non-linear constraints. 
Another example is the deduction capability, which is crucial for efficiently ver- 
ifying a DNN and whose effectiveness might depend on the internal state of the 
simplex solver. 


2.2 Piecewise-Linear Constraints (PiecewiseLinearConstraint 
Class) 


Throughout its execution, Marabou maintains a set of piecewise-linear con- 
straints that represent the DNN’s non-linear functions. In iterations devoted to 
satisfying these constraints, Marabou looks for any constraints that are not sat- 
isfied by the current assignment. If such a constraint is found, Marabou changes 
the assignment in a way that makes that constraint satisfied. Alternatively, in 
order to guarantee eventual termination, if Marabou detects that a certain con- 
straint is repeatedly not satisfied, it may perform a case-split on that constraint: 
a process in which the piecewise-linear constraint y is replaced by an equivalent 
disjunction of linear constraints c1 V...V Cn. Marabou considers these disjuncts 
one at a time and checks for satisfiability. If the problem is satisfiable when y is 
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replaced by some c;, then the original problem is also satisfiable; otherwise, the 
original problem is unsatisfiable. 

In our implementation, piecewise-linear constraints are represented by 
objects of classes that inherit from the PiecewiseLinearConstraint abstract class. 
Currently the two supported instances are ReLU and Max, but the design is mod- 
ular in the sense that new constraint types can easily be added. PiecewiseLin- 
earConstraint defines the interface methods that each supported piecewise-linear 
constraint needs to implement. Some of the key interface methods are: 


— satisfied(): the constraint object needs to answer whether or not it is satisfied 
given the current assignment. For example, for a constraint y = ReLU(a) and 
assignment x = y = 3, satisfied() would return true; whereas for assignment 
x = —5,y = 3, it would return false. 

— getPossibleFixes(): if the constraint is not satisfied by the current assignment, 
this method returns possible changes to the assignment that would correct the 
violation. For example, for x = —5,y = 3, the ReLU constraint from before 
might propose two possible changes to the assignment, x < 3 or y — 0, as 
either would satisfy y = ReLU(z). 

— getCaseSplits(): this method asks the piecewise-linear constraint y to return 
a list of linear constraints c,,...,Cn, such that vy is equivalent to c1 V... V Cn- 
For example, when invoked for a constraint y = max (x1, 22), getCaseSplits() 
would return the linear constraints cı : (y = £1 Aa, > £2) and c : (y = 
£2 A £2 > £1). These constraints satisfy the requirement that the original 
constraint is equivalent to c1 V c2. 

— getEntailedTightenings(): as part of Marabou’s deduction of tighter variable 
bounds, piecewise-linear constraints are repeatedly informed of changes to the 
lower and upper bounds of variables they affect. Invoking getEntatledTight- 
enings() queries the constraint for tighter variable bounds, based on current 
information. For example, suppose a constraint y = ReLU(z) is informed of 
the upper bounds x < 5 and y < 7; in this case, getEntailedTightenings() 
would return the tighter bound y < 5. 


2.3 Constraint- and Network-Level Reasoning 
(RowBoundTightener, ConstraintBoundTightener 
and SymbolicBoundTightener Classes) 


Effective deduction of tighter variable bounds is crucial for Marabou’s perfor- 
mance. Deduction is performed at the constraint level, by repeatedly examin- 
ing linear and piecewise-linear constraints to see if they imply tighter variable 
bounds; and also at the DNN-level, by leveraging the network’s topology. 

Constraint-level bound tightening is performed by querying the piecewise- 
linear constraints for tighter bounds using the getEntatledTightenings() method. 
Similarly, linear equations can also be used to deduce tighter bounds. For exam- 
ple, the equation x = y+ z and lower bounds x > 0, y > 1 and z > 1 
together imply the tighter bound x > 2. As part of the simplex-based search, 
Marabou repeatedly encounters many linear equations and uses them for bound 
tightening. 
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Several recent papers have proposed verification schemes that rely on DNN- 
level reasoning [5,23]. Marabou supports this kind of reasoning as well, by stor- 
ing the initial network topology and performing deduction steps that use this 
information as part of its iterative search. DNN-level reasoning is seamlessly 
integrated into the search procedure by (1) initializing the DNN-level reasoners 
with the most up-to-date information discovered during the search, such as vari- 
able bounds and the state of piecewise-linear constraints; and (2) feeding any 
new information that is discovered back into the search procedure. Presently 
Marabou implements a symbolic bound tightening procedure [23]: based on net- 
work topology, upper and lower bounds for each hidden neuron are expressed 
as a linear combination of the input neurons. Then, if the bounds on the input 
neurons are sufficiently tight (e.g., as a result of past deductions), these expres- 
sions for upper and lower bounds may imply that some of the hidden neurons’ 
piecewise-linear activation functions are now restricted to one of their linear 
segments. Implementing additional DNN-level reasoning operations is work in 
progress. 


2.4 The Engine (Engine and SmtCore Classes) 


The main class of Marabou, in which the main loop resides, is called the Engine. 
The engine stores and coordinates the various solution components, including 
the simplex core and the piecewise-linear constraints. The main loop consists, 
roughly, of the following steps (the first rule that applies is used): 


1. If a piecewise-linear constraint had to be fixed more than a certain number 
of times, perform a case split on that constraint. 

2. If the problem has become unsatisfiable, e.g. because for some variable a 
lower bound has been deduced that is greater than its upper bound, undo a 
previous case split (or return UNSAT if no such case split exists). 

3. If there is a violated linear constraint, perform a simplex step. 

4. If there is a violated piecewise-linear constraint, attempt to fix it. 

5. Return SAT (all constraints are satisfied). 


The engine also triggers deduction steps, both at the neuron level and at the 
network level, according to various heuristics. 


2.5 The Divide-and-Conquer Mode and Concurrency (DnC.py) 


Marabou supports a divide-and-conquer (D&C) solving mode, in which the 
input region specified in the original query is partitioned into sub-regions. The 
desired property is checked on these sub-regions independently. The D&C mode 
naturally lends itself to parallel execution, by having each sub-query checked 
on a separate node. Moreover, the D&C mode can improve Marabou’s overall 
performance even when running sequentially: the total time of solving the sub- 
queries is often less than the time of solving the original query, as the smaller 
input regions allow for more effective deduction steps. 
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Given a query ¢, the solver maintains a queue Q of (query, timeout) pairs. Q is 
initialized with one element (¢,T), where T, the initial timeout, is a configurable 
parameter. To solve ¢, the solver loops through the following steps: 


Pop a pair (¢’,t’) from Q and attempt to solve ¢’ with a timeout of t’. 

If the problem is UNSAT and Q is empty, return UNSAT. 

If the problem is UNSAT and Q is not empty, return to step 1. 

If the problem is SAT, return SAT. 

If a timeout occurred, split ¢/ into k sub-queries ¢/,...,¢), by partitioning 
its input region. For each sub-query ¢;, push (¢/,m- t’) into Q. 


PON Pe 


The timeout factor m and the splitting factor k are configurable parameters. 
Splitting the query’s input region is performed heuristically. 


2.6 Input Interfaces (AcasParser class, maraboupy Folder) 
Marabou supports verification queries provided through the following interfaces: 


— Native Marabou format: a user prepares a query using the Marabou C++ 
interface, compiles the query into the tool, and runs it. This format is useful 
for integrating Marabou into a larger framework. 

— Marabou executable: a user runs a Marabou executable, and passes to it 
command-line parameters indicating the network and property files to be 
checked. Currently, network files are encoded using the NNet format [11], 
and the properties are given in a simple textual format. 

— Python/TensorFlow interface: the query is passed to Marabou through 
Python constructs. The python interface can also handle DNNs stored as 
TensorFlow protobuf files. 


3 Evaluation 


For our evaluation we used the ACAS Xu [12], CollisionDetection [4] and 
TwinStream [1] families of benchmarks. Tool-wise, we considered the Reluplex 
tool which is the most closely related to Marabou, and also ReluVal [23] and 
Planet [4]. The version of Marabou used for the evaluation is available online [14]. 

The top left plot in Fig. 3 compares the execution times of Marabou and Relu- 
plex on 180 ACAS Xu benchmarks with a 1 hour timeout. We used Marabou in 
D&C mode with 4 cores and with T = 5, k = 4, and m = 1.5. The remaining 
three plots depict an execution time comparison between Marabou D&C (con- 
figuration as above), ReluVal and Planet, using 4 cores and a 1 hour timeout. 
Marabou and Reluval are evaluated over 180 ACAS Xu benchmarks (top right 
plot), and Marabou and Planet are evaluated on those 180 benchmarks (bottom 
left plot) and also on 500 CollisionDetection and 81 TwinStream benchmarks 
(bottom right plot). Due to technical difficulties, ReluVal was not run on the 
CollisionDetection and TwinStream benchmarks. The results show that in a 
4 cores setting Marabou generally outperforms Planet, but generally does not 
outperform ReluVal (though it does better on some benchmarks). These results 
highlight the need for additional DNN-level reasoning in Marabou, which is a 
key ingredient in ReluVal’s verification procedure. 
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Figure2 shows the average runtime of = *™ Marabou PI 
Marabou and ReluVal on the ACAS Xu prop- 
erties, as a function of the number of avail- 
able cores. We see that as the number of cores 
increases, Marabou (solid) is able to close 
the gap, and sometimes outperform, ReluVal 
(dotted). With 64 cores, Marabou outper- 
forms ReluVal on average, and both solvers SAE E 
were able to solve all ACAS Xu benchmarks 
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Fig. 3. A comparison of Marabou with Reluplex, ReluVal and Planet. 


4 Conclusion 


DNN analysis is an emerging field, and Marabou is a step towards a more mature, 
stable verification platform. Moving forward, we plan to improve Marabou in sev- 
eral dimensions. Part of our motivation in implementing a custom simplex solver 
was to obtain the needed flexibility for fusing together the solving process for lin- 
ear and non-linear constraints. Currently, this flexibility has not been leveraged 
much, as these pieces are solved relatively separately. We expect that by tackling 
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both kinds of constraints simultaneously, we will be able to improve performance 
significantly. Other enhancements we wish to add include: additional network- 
level reasoning techniques based on abstract interpretation; better heuristics for 
both the linear and non-linear constraint solving engines; and additional engi- 
neering improvements, specifically within the simplex engine. 
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Abstract. Probabilistic bisimulation is a fundamental notion of process equiva- 
lence for probabilistic systems. It has important applications, including the for- 
malisation of the anonymity property of several communication protocols. While 
there is a large body of work on verifying probabilistic bisimulation for finite 
systems, the problem is in general undecidable for parameterized systems, i.e., 
for infinite families of finite systems with an arbitrary number n of processes. 
In this paper we provide a general framework for reasoning about probabilistic 
bisimulation for parameterized systems. Our approach is in the spirit of software 
verification, wherein we encode proof rules for probabilistic bisimulation and use 
a decidable first-order theory to specify systems and candidate bisimulation rela- 
tions, which can then be checked automatically against the proof rules. 

We work in the framework of regular model checking, and specify an infinite- 
state system as a regular relation described by a first-order formula over a uni- 
versal automatic structure, i.e., a logical theory over the string domain. For prob- 
abilistic systems, we show how probability values (as well as the required oper- 
ations) can be encoded naturally in the logic. Our main result is that one can 
specify the verification condition of whether a given regular binary relation is 
a probabilistic bisimulation as a regular relation. Since the first-order theory of 
the universal automatic structure is decidable, we obtain an effective method 
for verifying probabilistic bisimulation for infinite-state systems, given a regu- 
lar relation as a candidate proof. As a case study, we show that our framework 
is sufficiently expressive for proving the anonymity property of the parameter- 
ized dining cryptographers protocol and the parameterized grades protocol. Both 
of these protocols hitherto could not be verified by existing automatic methods. 
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Moreover, with the help of standard automata learning algorithms, we show that 
the candidate relations can be synthesized fully automatically, making the verifi- 
cation fully automated. 


1 Introduction 


Equivalence checking using bisimulation relations plays a fundamental role in formal 
verification. Bisimulation is the basis for substitutability of systems: if two systems are 
bisimilar, their behaviors are the same and they satisfy the same formulas in expressive 
temporal logics. The notion of bisimulation is defined both for deterministic [39] and 
for probabilistic transition systems [34]. In both contexts, checking bisimulation has 
many applications, such as proving correctness of anonymous communication proto- 
cols [15], reasoning about knowledge [22], program optimization [32], and optimiza- 
tions for computational problems (e.g. language equivalence and minimization) of finite 
automata [12]. 

The problem of checking bisimilarity of two given systems has been widely stud- 
ied. It is decidable in polynomial-time for both probabilistic and non-probabilistic finite- 
State systems [6, 17,20,52]. These algorithms form the basis of practical tools for check- 
ing bisimulation. For infinite-state systems, such as parameterized versions of commu- 
nication protocols (i.e. infinite families of finite-state systems with an arbitrary num- 
ber n of processes), the problem is undecidable in general. Most research hitherto has 
focused on identifying decidable subcases (e.g. strong bisimulations for pushdown sys- 
tems for probabilistic and non-probabilistic cases [25,47,48]), rather than on providing 
tool support for practical problems. 

In this paper, we propose a first-order verification approach—inspired by software 
verification techniques—for reasoning about bisimilarity for infinite-state systems. In 
our approach, we provide first-order logic proof rules to determine if a given binary 
relation is a bisimulation. To this end, we must find an encoding of systems and rela- 
tions and a decidable first-order theory that can formalize the system, the property, 
and the proof rules. We propose to use the decidable first-order theory of the univer- 
sal automatic structure [8,10]. Informally, the domain of the theory is a set of words 
over a finite alphabet 2’, and it captures the first-order theory of the infinite |X|-ary tree 
with a relation that relates strings of the same level. The theory can express precisely 
the class of all regular relations [8] (a.k.a. automatic relations [10]), which are rela- 
tions y(x1,..., £k) over strings X* that can be recognized by synchronous multi-tape 
automata. It is also sufficiently powerful to capture many classes of non-probabilistic 
infinite-state systems and regular model checking [3, 13,49-51]. 

We demonstrate the effectiveness of the approach by encoding and automatically 
verifying some challenging examples from the literature of parameterized systems in 
our logic: the anonymity property of the parameterized dining cryptographers protocol 
[16] and the grades protocol [29]. These examples were only automatically verified 
for some fixed parameters using finite-state model checkers or equivalence checkers 
(e.g. see [28,29]). Just as invariant verification for software separates out the proof 
rules (verification conditions in a decidable logic) from the synthesis of invariants, we 
separate out proof rules for bisimulation from the synthesis of bisimulation relations. 
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We demonstrate how recent developments in generating and refining candidate proofs 
as automata (e.g. [18,26,27,37,38,40,41,53]) can be used to automate the search of 
proofs, making our verification fully “push button.” 


Contributions. Our contributions are as follows. First, we show how probabilistic 
infinite-state systems can be faithfully encoded in the first-order theory of universal 
automatic structure. In the past, the theory has been used to reason about qualitative 
liveness of weakly-finite MDPs (e.g. see [36,37]), which allows the authors to disre- 
gard the actual non-zero probability values. To the best of our knowledge, no encoding 
of probabilistic transition systems in the theory was available. In order to be able to 
effectively encode probabilistic systems, our theory should typically be two-sorted: one 
sort for encoding the configurations, and the other for encoding the probability values. 
We show how both sorts (and the operations required for the sorts) can be encoded 
in the universal automatic structure, which requires only the domain of strings. In the 
sequel, such transition systems will be called regular transition systems. 

Second, using the minimal probability assumption on the transition systems [34] 
(i.e. there exists an € > 0 such that any non-zero transition probability is at least €)— 
which is often satisfied in practice—we show how the verification condition of whether 
a given regular binary relation is a probabilistic bisimulation can be encoded in the 
theory. The decidability of the first-order theory over the universal automatic structure 
gives us an effective means of checking probabilistic bisimulation for regular transition 
systems. In fact, the theory can be easily reduced to the weak monadic theory WSIS of 
one successor (therefore, allowing highly optimized tools like Mona [31] and Gaston 
[23]) by interpreting finite words as finite sets (e.g. see [19,46]). 

Our framework requires the encoding of the systems and the proofs in the first-order 
theory of the universal automatic structure. Which interesting examples can it capture? 
Our third contribution is to provide two examples from the literature of parameterized 
verification: the anonymity property of the parameterized dining cryptographers proto- 
col [16] and of the parameterized grades protocol [29]. We study two versions of dining 
cryptographers protocol in this paper: the classical version where the secrets are single 
bits, and a generalized version where the secrets are bit-vectors of arbitrary length. 

Thus far, our framework requires a candidate proof to be supplied by the user. Our 
final contribution is to demonstrate how standard techniques from the synthesis litera- 
ture (e.g. automata learning [18,26,27,37,38,40,41,53]) can be used to fully automate 
the proof search. Using automata learning, we successfully pinpoint regular proofs for 
the anonymity property of the three protocols: the two dining cryptographers protocols 
are verified in 6 and 28 s, respectively, and the grades protocol in 35 s. 


Other Related Work. The verification framework we use in this paper can be construed 
as a regular model checking [3] framework using regular relations. The framework uses 
first-order logic as the language, which makes it convenient to express many verification 
conditions (as is well-known from first-order theorem proving [14]). The use of the 
universal automatic structure allows us to express two different sorts (configurations 
and probability values) in one sort (i.e. strings). Most work in regular model checking 
focuses on safety and liveness properties (e.g. [2,3, 11, 13,27,36,37,40,42,49,51,53)]). 

Some automated techniques can prove the anonymity property of the dining cryp- 
tographers protocol and the grades protocol in the finite case, e.g., the PRISM model 
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checker [28,45] and language equivalence by the tool APEX [29]. To the best of our 
knowledge, our method is the first automated technique proving the anonymity property 
of the protocols in the parameterized case. 

Our work is in spirit of deductive software verification (e.g., [4, 14,24,35,43,44]), 
where one provides inductive invariants manually, and a tool automatically checks cor- 
rectness of the candidate invariants. In theory, our result yields a fully-automatic proce- 
dure by enumerating all candidate regular proofs, and at the same time enumerating all 
candidate counterexamples (note that we avoid undecidability by restricting attention to 
proofs encodable as regular relations). In our implementation, we use recent advances 
in automata-learning based synthesis to efficiently encode the search [18,37]. 


2 Preliminaries 


General Notation. We use N to denote non-negative integers. Given a,b € R, we use 
a standard notation [a,b] := {c € R : a < c < b} to denote real intervals. Given 
a set S, we use S* to denote the set of all finite sequences of elements from S. The 
set S* always includes the empty sequence which we denote by £. We call a function 
f : S — [0,1] a probability distribution over S if $` <5 f(s) = 1. We shall use 
I, to denote the probability distribution f with f(s) = 1, and Dg to denote the set 
of probability distributions over S. Given a function f : Xı x --- x Xn — Y, the 
graph of f is the relation {(71,...,0n, f(@1,---;0n)) : Vi € {1,...,n}. ti € Xi}. 
Whenever a relation R is an equivalence relation over set S, we use S/R to denote the 
set of equivalence classes created by R. Depending on the context, we may use p R q or 
R(p, q) to denote (p,q) € R. 


Words and Automata. We assume basic familiarity with word automata. Fix a finite 
alphabet X. For each finite word w := w1 . .. Wn E X*, we write wļi, j], where 1 < i < 
j < n, to denote the segment w; ... wj. Given an automaton A := (X, Q, ô, qo, F), a 
run of A on w is a function p : {0,...,2} — Q with p(0) = qo that obeys the transition 
relation 6. We may also denote the run p by the word p(0) - -- p(n) over the alphabet Q. 
The run p is said to be accepting if p(n) € F, in which case we say that the word w is 
accepted by A. The language L(A) of A is the set of words in X* accepted by A. 


Transition Systems. We fix a set ACT of action symbols. A transition system over ACT 
is a tuple G := (S; {—,}aeact), where S is a set of configurations and >a C S x S 
is a binary relation over S. We use — to denote the relation |J cacr ~a- We say that 
a sequence sı — +--+ > S41 is a path in © if s1,...,5n41 E S and si > Si+ı 
for i € {1,...,n}. A transition system is called bounded branching if the number of 
configurations reachable from a configuration in one step is bounded. Formally, this 
means that there exists an a priori integer N such that for all s € S, |{s’ € S: s => 
SIL <N. 


Probabilistic Transition Systems. A probabilistic transition system (PTS) [34] is a 
structure © := (S; {da}aeact) where S is a set of configurations and ôa : S —> Dg U 
{0} maps each configuration to either a probability distribution or a zero function 0. 
Here ôa(s) = 0 simply means that s is a “dead end” for action a. We shall use a(s, s”) 
to denote 6,(s)(s’). In this paper, we always assume that ôa (s, s”) is a rational number 
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and |{s’ : a(s, 8’) 4 O}| < co. The underlying transition graph of a PTS is a transition 
system (S; {—4}aeact) such that s >, s’ iff ôa(s, s") £0. 

It is standard (e.g. see [34]) to impose the minimal probability assumption on the 
PTS that we shall be dealing with, i.e., there is € > 0 such that any transition with a 
non-zero probability p satisfies p > e. This assumption is practically sensible since it 
is satisfied by most PTSs that we deal with in practice (e.g. finite PTS, probabilistic 
pushdown automata [21], and most examples from probabilistic parameterized systems 
[36,37] including our examples from Sect.5). The minimal probability assumption, 
among others, implies that the PTS is bounded-branching (i.e. that its underlying tran- 
sition graph is bounded-branching). In the sequel, we shall adopt this assumption. 


Probabilistic Bisimulations. Let © := (S; {5.}acact) be a PTS. We write s “+, 9’ 
if >) eg fals, 8’) = p. A probabilistic bisimulation for © is an equivalence relation 
R over S, such that (p, q) € R implies 


Va € ACT. YS' € S/R. (p =a S OG, S’). (1) 


We say that p and q are probabilistic bisimilar (written as p ~ q) if there is a proba- 
bilistic bisimulation R such that (p, q) € R. We can compute probabilistic bisimulation 
between two PTSs G := (S; {ôa }acacr) and G’ := (S’; {8 tacacr) by computing 
a probabilistic bisimulation R for the disjoint union of G and ©’, which is defined as 
GU © := (S U S’; {8 }acact) where ô (s) := a(s) for s € S, and 8% (s) := (s) 
for s € S”. In such case, we say R is a probabilistic bisimulation between G and ©’. 


3 Framework of Regular Relations 


In this section we describe the framework of regular relations for specifying proba- 
bilistic infinite-state systems, properties to verify, and proofs, all in a uniform symbolic 
way. The framework is amenable to automata-theoretic algorithms in the spirit of regu- 
lar model checking [3,13]. 

The framework of regular relations [8] (a.k.a. automatic relations [9]) uses the first- 
order theory of universal! automatic structure 


Uu := (2*; <,eqL, {la jac 5), (2) 


where X is some finite alphabet, < is the (non-strict) prefix-of relation, eqL is the 
binary equal length predicate, and lą is a unary predicate asserting that the last letter 
of the word is a. The domain of the structure is the set of finite words over »’, and for 
words w, w’ € X*, we have w < w’ iff there is some w” € X* such that w- w” = w’, 
eqL(w, w’) iff |w| = |w’|, and /,(w) iff there is some w” € X* such that w = w” - a. 

Next, we discuss the expressive power of first-order formulas over the universal 
automatic structures, and decision procedures for satisfiability of such formulas. In 
Sect. 4, we shall describe: (1) how to specify a PTS as a first-order formula in 4, and (2) 
how to specify the verification condition for probabilistic bisimulation property in this 
theory. In Sect.5, we shall show that the theory is sufficiently powerful for capturing 
probabilistic bisimulations for interesting examples. 


' Here, “universal” simply means that all automatic structures are first-order interpretable in this 
structure. 
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Expressiveness and Decidability. The name “regular” associated with this framework 
is because the set of formulas (x1, ..., £p) first-order definable in 4 coincides with 
regular relations, i.e., relations definable by synchronous automata. More precisely, we 
define [y] as the relation which contains all tuples (w1, ..., wg) € (’*)* such that 
u = y(wi,..., we). In addition, we define the convolution wi & --- @ wp of words 
W1,---,We E€ X* as a word w over XF (where L ¢ X) such that w|i] = (a1,..., ap) 
with 


bo if |w,| > 1, or 
aj = 


L otherwise. 
In other words, w is obtained by juxtaposing w1, ..., wg and padding the shorter words 
with L. For example, 010 & 00 = (0,0)(1,0)(0, L). A k-ary relation R over X* is 
regular if the set {w1 Q --- Q wz: (w1,..., Wp) € R} is a regular language over the 


alphabet X¥ . The relationship between $4 and regular relations can be formally stated 
as follows. 


Proposition 1 ({8—10]). 


1. Given a formula p(T) over 4, the relation |p] is effectively regular. Conversely, 
given a regular relation R, we can compute a formula (©) over 4 such that |p] = 
R. 

2. The first-order theory of 4 is decidable. 


The decidability of the first-order theory of 4 follows using a standard automata- 
theoretic algorithm (e.g. see [9,49]). 

In the sequel, we shall also use the term regular relations to denote relations defin- 
able in 4. In addition, to avoid notational clutter, we shall freely use other regular 
relations (e.g. successor relation <succ of the prefix <, and membership in a regular 
language) as syntactic sugar. 

We note that the first-order theory of 4 can also be reduced to weak monadic theory 
WSIS of one successor (therefore, allowing highly optimized tools like MONA [31] 
and Gaston [23]) by translating finite words to finite sets. The relationship between 
the universal automatic structure and WS1S can be made precise using the notion of 
finite-set interpretations [19,46]. 


4 Probabilistic Bisimilarity Within Regular Relations 


In this section, we show how the framework of regular relations can be used to encode 
a PTS, and the corresponding proof rules for probabilistic bisimulation. 


4.1 Specifying a Probabilistic Transition System 


Since we assume that all probability values specified in our systems are rational num- 
bers, the fact that our PTS is bounded-branching implies that we can specify the prob- 
ability values by natural weights (by multiplying the probability values by the least 
common multiple of the denominators). For example, if a configuration c has an action 
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toss that takes it to cı and c2, each with probability 1/2, then the new system simply 
changes both values of 1/2 to 1. This is a known trick in the literature of probabilistic 
verification (e.g. see [1]). Therefore, we can now assume that the transition probability 
functions have range N. The challenge now is that our encoding of a PTS in the univer- 
sal automatic structure must encode two different sorts as words over a finite alphabet 
+’: configurations and natural weights. 

Now we are ready to show how to specify a PTS G in our framework. Fix a finite 
alphabet X containing at least two letters 0 and 1. We encode the domain of © as words 
over X. In addition, a natural weight n € N can be encoded in the usual way as a binary 
string. This motivates the following definition. 


Definition 1. Let G be a PTS (S; {6a}acact). We say that © is regular if the domain 
S is a regular subset of X* (i.e. definable by a first-order formula p(x) with one free 
variable over 4 ), and if the graph of each function ôa is a ternary regular relation (i.e. 
definable by a first-order formula (x,y,z) over U, where x and y encode configura- 
tions, and z encodes a natural weight). 


Definition 1 is quite general since it allows for an infinite number of different 
natural weights in the PTS. Note that we can make do without the second sort (of 
numeric weights) if we have only finitely many numeric weights n1,..., nm. This can 
be achieved by specifying a regular relation Ra, for each action label a € ACT and 
numeric weight n; with i € {1,...,m}. 


Example 1. We show a regular encoding of a very simple PTS: a random walk on the 
set of natural numbers. At each position x, the system can non-deterministically choose 
to loop or to move. If the system chooses to loop, it will stay at the same position with 
probability 1. If the system chooses to move, it will move to x + 1 with probability 1/4, 
or move to max(0, x — 1) with probability 3/4. Normalising the probability values by 
multiplying by 4, we obtain the numeric weights of 4, 1, and 3 for the aforementioned 
transitions, respectively. 

To represent the system by regular relations, we encode the positions in unary and 
the numeric weights in binary. The set of configurations is the regular language 1*. The 
graph of the transition probability function can be described by a first-order formula 
p(z, Y, 2) = PlooplT, Y, Z) V PmovelT, Y, Z) over U, where 

Ploop( £, Y, Z) = x E€ 1* Ay E€ 1* A ((x = y A z = 100) V (z #y ^z =0)); 

Pmovel£, Y, 2) = x E 1* Ay E 1* A ((£ Ssue y ^z =1)V 
(Y Xsuce £ A z = 11) V (z =£^y=E^Az=11)vV 
(A(@ suce Y) A (Y ~suce ©) Aaler =e Ay=e)Az=0)). 


Example 2. As a second example, consider a PTS (from [25], Example 1) described by 
a probabilistic pushdown automaton with states Q = {p, q,r} and stack symbols T’ = 
{X, X', Y, Z}. There is a unique action a, and the transition rules ôa are as follows: 


pX 2 qXX pX% p qX >pXX r¥ SrXX 
rX S rYX rX 23 ryx' rx 237 


rX! 24 rYX rX! Ph eyx rX! 5r 
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A configuration of the PTS is a word in QI, consisting of a state in Q and a word over 
the stack symbols. A transition can be applied if the prefix of the configuration matches 
the left hand side of the transition rules above. We encode the PTS as follows: the set 
of configurations is QI, the weights are represented in binary after normalization, and 
the transition relation y(x, y, z) encodes the transition rules in disjunction. For example, 


the disjunct corresponding to the rule pX 22 qX X is 


x E€ QI* Ny E€ QI* A (3u. x = pXu ^y =qXXu) ^z = 101. 


Note that the PTS is bounded branching with a bound 3. 


4.2 Proof Rules for Probabilistic Bisimulation 


Fix the set ACT of action symbols and the branching bound N > 1, owing 
to the minimal probability assumption. Consider a two-sorted vocabulary © = 
({Pasaeact, R, +), where P, is a ternary relation (with the first two arguments over 
the first sort, and the third argument over the second sort of natural numbers), R is a 
binary relation over the first sort, and + is the addition function over the second sort of 
natural numbers. The main result we shall show next is summarized in the following 
theorem: 


Theorem 1. There is a fixed first-order formula ® over o such that a binary relation 
R is a probabilistic bisimulation over a bounded-branching PTS © = (S; {ba}aeact) 
iff (S, R) H &. Furthermore, when © is a regular PTS and R is a regular relation, 
we can compute in polynomial time a first-order formula P over 4 such that R is a 
probabilistic bisimulation over © iff U = P. 


This theorem implies the following result: 


Theorem 2. Given a regular relation E C X* x X* and a bounded-branching regular 
PTS © = (S; {da}aeact), there exists an algorithm that either finds (u,v) € E which 
are not probabilistically bisimilar or finds a regular probabilistic bisimulation relation 
R over © such that E C R if one exists. The algorithm does not terminate iff E is con- 
tained in some probabilistic bisimulation relation but every probabilistic bisimulation 
R containing E is not regular. 


Note that when verifying parameterized systems we are typically interested in 
checking bisimilarity over a set of pairs (instead of just one pair) of configurations, 
and hence F in the above statement. 


Proof of Theorem 2. To prove this, we provide two semi-algorithms, one for checking 
the existence of R and the other for showing that a pair (v,w) € E is a witness for 
non-bisimilarity. 

By Theorem 1, we can enumerate all possible candidate regular relation R and 
effectively check that R is a probabilistic bisimulation over ©. The condition that Æ C 
R is a first-order property, and so can be checked effectively. 

To show non-bisimilarity is recursively enumerable, observe that if we fix (v, w) € 
E and a number d, then the restrictions G,, and G, to configurations that are of distance 
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at most d away from v and w (respectively) are finite PTS. Therefore, we can devise 
a semi-algorithm which enumerates all (v, w) € EF, and all probabilistic modal logic 
(PML) formulas [34] F over ACT containing only rational numbers (i.e. a formula of 
the form (a),,F’, where u € [0,1] is a rational number, which is sufficient because 
we assume only rational numbers in the PTS). We need to check that 6,, v = F, but 
Gw, w ¥ F. Model checking PML formulas over finite systems is decidable (in fact, 
the logic is subsumed by Probabilistic CTL [7]), which makes our check effective. 


4.3 Proof of Theorem 1 


In the rest of the section, we shall give a proof of Theorem 1. Given a binary relation 
R C S x S, we can write a first-order formula Feq(R) for checking that R is an 
equivalence relation: 


Vs,t,u E€ S.R(s,s) A (R(s,t) > R(t, s)) A ((R(s,t) A R(t, u) > R(s,u)). 


We shall next define a formula vy, (p, q) for each a € ACT, such that R is a probabilistic 
bisimulation for G = (S; {da}aeact) iff (S, R) H &(R), where 


(R) := Feq(R) AVp, q € S. R(p,q) > Nacar (Ya?) A ba(Q)) V palp, q). (3) 


The formula a(s) := Vs’ € S. a(s, s") = 0 states that configuration s cannot move 
to any configuration through action a. 

Before we describe ya(p,q), we provide some intuition and define some interme- 
diate macros. Fix configurations p and q. Informally, pa(p, q) will first guess a set of 


configurations u1, ..., uy containing the successors of p on action a, and a set of con- 
figurations v1, . . . , Uj containing the successors of q on action a. Second, it will guess 
labellings a1,...,a@y and 61, ..., On which correspond to partitionings of the config- 
urations u1,..., uy and v1,...,UnN, respectively. The intuition is that the a’s and (’s 


“name” the partitions: if a; = a; (resp. 8; = j), then u; and u; (resp. v; and v;) are 

guessed to be in the same partition. The formula then checks that the guessed partition- 

ing is compatible with the equivalence relation R (i.e. if the labelling claims u; and uj 

are in the same partition, then indeed R(u;, u,;) holds), and that the probability masses 

of the partitions assigned by configurations p and q satisfy the constraint given in (1). 
For the first part, we define a formula 


SUCCa (W; U1,..., UN) = (eee ui £ uj) A 


(vu E S. alw, u) 40> Vicin u = ui) ; 


stating that the successors of configuration w on action a are among the N distinct con- 
figurations u1,..., un. Note that a configuration may have fewer than N successors. 
In this case, we can set the rest of the variables to arbitrary distinct configurations. 

For the second part, we shall check that R is compatible with the guessed partitions, 
and that configurations p and q assign the same probability mass to the same partition. 
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Let ky,..., kn be a labelling for configurations s1,...,5,. To check that the partition- 
ing induced by the labelling is compatible with R, we need to express the condition that 
ki = k; if and only if R(s;, sj) holds. To this end, we define a formula 


compat p(s1, wee Sn} ky, since kn) = ie (R(s;, sj) < ki = kj) P 


Now, we are ready to define y,(p, q): 


Palp, q) = Jur,...,UNn,U1,---,UN E S. Jai, ..., aN, b1,- BN EN. 
SUCCa (p; U1,---,UN) A SUCCa (q; v1, ..., UN) A (4) 


compatp(u1,..., UN, U1,- --, UN; Q1;::-; QN; b1,- BN) A 
vk EN. ($, „pip u) =>, g pelo) - 


With this definition, pa(p, q) holds if and only if p “+, S’ 4 q +, S' holds for any 
p > 0 and equivalence class S’ € S/R. 


Example 3. Consider the PTS from Example 2. The configurations pX Z and rX are 
probabilistic bisimilar. This can be seen using a probabilistic bisimulation relation with 
equivalence classes {pX*Z} U {rw : w € {X,X'}*} for all k > 0 and {q4 X+! Z} U 
{rYw : w € {X,X’}*} for all k > 1. The probabilistic bisimulation relation is 
definable as the symmetric closure of a regular relation R, where (w1, w2) € R iff 


(w1 = w2) V 

(wi E€ pX*Z Nw E€ r(X + X’ LA |wi| = |wel) V 

(wy E r(X + X’) Aw € r(X + X")* A |wi] = |wal) V 
(wr € qX*Z Nw E rY(X +X") LA |wi| = wel) V 
(w E rY (X + X')* Aw € rY (X + X’)* A |wi| = wel). 


For this example, the formula (3) simplifies to Feg(R) A Vs,t € S. palp, q) for the 
unique action a. This formula defines a condition that checks the bisimulation relation 
for all states symbolically. To see the formula in action, fix configurations pX Z and 
rX which are probabilistic bisimilar. In the PTS, pX Z has two successors, qX X Z 
and pZ, each with probability 0.5, and rX has three successors, rY X with probability 
0.3, rY X’ with probability 0.2, and r with probability 0.5. In the formula for pa(p, q), 
we can set the successors u; of pX Z and the successors vj of rX as above (the third 
“successor” ug is set to an arbitrary configuration not reachable from pX Z), and set 
a, = 1, ag = 2, 2, = b2 = 1, and 83 = 2, corresponding to the equivalence classes 
of the bisimulation relation. One can check that the probability masses to these classes 
are the same. 

We remark that the first-order theory of 4 is sufficient to encode any probabilistic 
pushdown automaton, not just this example. 


We proceed to show that if R and ôa are first-order definable over U then so are Ya 
and Ya. Suppose that ôa is encoded using the ternary relation 6, (2, y, z), as stated in 
the previous section. (We shall re-use the symbol 6 here to avoid a clash of names). 
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We define Wa(s) := Vs’ € S. Yz € N. dals, 5’, z) & z = 0. To define ya, the key 
point is to express the sum of transition probabilities in the logic. We use the fact that 
addition of integers in binary encoding is regular (see e.g. [9]), and write a formula that 
performs iterated addition. Formally, for each a € ACT we define a formula Xa such 
that 


Xa(Uj U1,---, UN} Q1,- QN; k; z) = 


r 
Jz1,...,ZN+1 EN. a =OAzy=2zAf\ : XalU, Ui, Qi, K, Zi, 2141), 
1<i<N 


where 


(ti, u, hye ey) (k = k A 3z. alu, u, z) y= rz) V KA kA Gg=—Z) 


performs a single addition—we use the fact that addition “y = x + z” in binary is 
encodable as a regular relation—and z1, ...,zy+1 store the intermediate sums. Hence, 
given k € N, u1,..., UN, U1,..., UN E S, and a1,..., QN, b1,- BN EN, 


Dy: ajy=k dal, ui) z D Bi=k ôa (q, vi) 


if and only if 


Jz E€ N. Xa(p; u1,- iu, UN; Q1,- --, QN; k; 2) A Xalq; V1,- --, UN; Bis- y BN; k; 2). 


It follows that pa(p, q) defined in (4) can be encoded in the first-order theory of U. 


Remark. Note that checking the validity of a given presentation of a regular PTS is 
algorithmic. To see this, suppose we are given a set of formulae {ôa (x, y, z)}aeact that 
is claimed to encode the probabilistic transition functions of a PTS with a branching 
bound N. Fix a formula ôa. First, we need to check that for all x € S, there are at most 
N distinct y’s such that ĝa (x, y, z) satisfies z 4 0. Second, we need to check that [ôa] is 
a function, i.e., Vx, y. d!z. a(x, yY, z), where 3!z. p(z, z) is a shorthand for the formula 
asserting there exists precisely one z such that y(Z, z) is true. Third, we need to check 
that [ôa] encodes a mapping S — {0} U Ds. The first two requirements are easily seen 
to be expressible as a first-order formula and hence is algorithmic over 4. The third 
requirement amounts to checking the assertion that there exists Wa € N satisfying 


Vr E€ S. (Vy € S. Yz EN. a(x, y, z) & z =0)V 
(Ayi1,---,yn E S. Iz1,..., ZN EN. 


SUCCa (£; 91,---, Yn) A oe Ôa (T, Yi, Zi) A D Zi = Wa), 


which is a first-order formula and is algorithmic over U4 by the fact that summation of 
a fixed number of weights is regular (as shown earlier in this section). Finally, since all 
of the w,’s are expected to be the same common multiple of the denominators of the 
transition probabilities, we need to check that there is w € N such that wa = w for all 
a € ACT. This is again algorithmic as we can pinpoint the exact value of each wa by 
enumeration. 
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5 Application to Anonymity Verification 


In this section, we show how to verify the anonymity property of cryptographic pro- 
tocols via computation of probabilistic bisimulations. We shall first formalize the con- 
nection between the concepts of anonymity and probabilistic bisimulation. We then 
introduce a verification framework and apply it to verify the anonymity property of the 
dining cryptographers protocol [16] and the grades protocol [29]. 

A (discrete time) Markov chain (a.k.a. DTMC) is a structure M := (S; 6; L) where 
S is a set of configurations, ô : S — Dg is a family of probability distributions, and 
L : S — ACT is a labelling of the states. We shall use 6(s, s’) to denote 6(s)(s’), 
the transition probability from s to s’. A sequence sọ ...Sn E€ S* is called a path of 
M if (si, 5:41) > 0 fori € {0,...,n — 1}. The probability distribution induced by 
the paths in a DTMC can be defined using a standard cylinder construction (see e.g. 
[33]) as follows. Given a finite path 7 := sg9---s, E S*, we set Run, to be a basic 
cylinder, which is the set of all finite/infinite paths with 7 as a prefix. We associate this 
cylinder with probability Pr*°(Run,) = is, 6(8;, Si+1). This gives rise to a unique 
probability measure for the c-algebra over the set of all paths from so. 

Given a PTS G := (53 {da}aeact), an adversary f : S* — ACT resolves the 
non-determinacy of © and induces a DTMC Gy := ($"; 6’; L’). Here S” := S* U {$} 
contains all finite paths of G plus a “sink state” $ such that 6’(7) := Ig? if and only 
if either 7 = $, or df (,) is the zero function. We define 6’(7) := 47) otherwise. The 
labelling of Gy is defined as L'($) := L and L’ (r) := f(r) fora € S*. 

Given a DTMC (S; 0; L}, the trace of a path T := s9---S, E 5S* is defined as 
T(T) := L(so)--- L(sn). A trace event T is a set of finite traces; the probability of 7 
with respect to a configuration s is specified with Pr*(7) := Pr*(U{Run, : r(m) € 
T,7 starts from s}). 

Now we are ready to define the concept of anonymity. Fix © := (5; {da}aeact) 
and a set Z C S of initial configurations. We say G is anonymous to an adversary f if 
for all s € Z and trace event 7, the value of Pr? (T) in Gy is solely determined by 7. 
Intuitively, this means that the adversary cannot obtain any information about a specific 
initial configuration by experimenting on the system and observing the traces. 

We shall only consider external adversaries in this paper. An adversary f : S* — 
ACT is external if f (so -+ Sn) = f(s9--- sh) when L(s;) = L(s}) fori € {0,...,n}. 
That is, an external adversary takes action solely based on the trace she has observed so 
far. We call a PTS anonymous if it is anonymous to any external adversary. The follow- 
ing result establishes a connection between the anonymity property and probabilistic 
bisimulations. 


Proposition 2. Let G := (S; {da}aeact) be a PTS and f be an external adversary for 
G. Then for all u,v € S such that u ~ v, Pr“(T) = Pr” (T) holds for any trace event 
T in Gy. That is, configurations u and v induce the same trace distribution in ©ș. 


Based on Proposition 2, we propose a framework to verify the anonymity property 
of G as follows. We first specify a “reference system” ©’ := (S; {6/ }aeact) that has 


2 Recall that 7 s denotes the point distribution at s, namely Is (s) = 1. 
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the same initial configurations and actions as those of G, except that the trace distribu- 
tion of 6; is independent of specific initial configurations for any adversary f. We then 
try to find a bisimulation relation R between G and the reference system G’ satisfying 
R D {(s,s') €T x I’ : s = s'}. When such a relation R is found, we can conclude 
that the trace distribution of Gy is also independent of the initial configurations for any 
adversary f, and hence prove the anonymity property of G. 


The Dining Cryptographers Protocol. Dining cryptographers protocol [16] is a 
multi-party computation algorithm aiming to securely compute the XOR of the secret 
bits held by the participants. More precisely, consider a ring of n > 3 partici- 
pants po, ...,Ðn—1ı such that each participant p; holds a secret bit x;. To compute 
£o @-+:GX,_ 1 without revealing information about the values of £o, ...,£n—1, the 
participants carry out a two-stage computation as follows: (i) Each two adjacent partici- 
pants p;, Ppi+ı compute a random bit b; that is accessible only to them; (ii) Each partici- 
pant p; announces the value a; := x; 9b; ® b;_1° to the other participants. Hence, every 
participant p; can observe the values of x;, bi, bi—1 and ao,..., @y—1. It turns out that 
ao 8: ++ an-ı = Lo B+ +: PLp—1, S0 all participants are able to compute the XOR of 
the secret bits after executing the protocol. Furthermore, the anonymity property of the 
protocol assures that any individual participant p; cannot infer the values of the other 
secret bits from the information she has observed during the execution of the protocol. 

We model the protocol as a length-preserving regular PTS. The configurations of 
a ring of n participants are encoded as words of size n. The initial configurations are 
words w € {0,1}* such that wļi] represents x; for i € {0,...,|w] — 1}. The transi- 
tion relation consists of six transitions: observer non-deterministically tossing head (via 
action head), observer non-deterministically tossing tail (via action tail), non-observer 
tossing head with probability 0.5 (via action toss), non-observer tossing tail with proba- 
bility 0.5 (via action toss), participant announcing zero (via action zero), and participant 
announcing one (via action one). The outcomes of the tosses by the observer are visible 
(i.e. as actions head and tail), while the outcomes of the tosses by the other partici- 
pants are hidden (i.e. as action toss). Each maximal trace from an initial configuration 
of size n consists of n successive tossing actions, followed by n successive announcing 
actions. Starting from an initial configuration w and for i € {0,...,n— 1}, the i-th 
toss action updates the value of w[j] to w[j] ® b; for j € {i,i +1}, where b; = 1 if a 
head is tossed and b; = 0 otherwise. Any configuration v reached after n tosses would 
satisfy v[i] = x; © bi ® bi—ı for i € {0,...,n — 1}. The PTS then “prints out” the 
configuration by going through n announcement transitions via actions ao,..., @n—1, 
such that a; is one if v|i] = 1 and a; is zero if v[i] = 0. 

We consider the case where the first participant of the protocol is the observer. The 
maximal traces of the PTS in this case are in form of t - t’, where |t| = |t], t € 
{head, tail} toss* {head, tail}, and t’ € {zero, one}*. For example, head toss tail one 
zero zero is a maximal trace starting from initial configuration 010. To prove anonymity, 
we define a reference system such that the initial configurations and the actions are 
the same as those of the original PTS, except that the announcements ao,...,@n—1 


3 All arithmetical operations on the subscripts are performed modulo n to take the ring structure 
into account. 
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encoded in the maximal traces from an initial configuration w are uniformly distributed 
over {(@,---,@n—1) : a0 ®-+: ® an-ı = w[0] ©--- win — 1], ao = w0] © 
bo © bn—1}.4 In this way, the distribution of the announcements is independent of the 
initial configuration once the values of £o @--- ® Yn_1, Xo, bo, and b,_, (ie. the 
information revealed to the first participant) are fixed. We then compute a probabilistic 
bisimulation between the original system and the reference system, establishing the 
anonymity property that the first participant cannot infer the secret bits of the other 
participants from the information she observes. 


A generalized Dining Cryptographers Protocol. We have also considered a generalized 
dining cryptographers protocol where the secret messages 2o,...,%p—1 Of the n par- 
ticipants are bit-vectors of the same size. Note that the set of the initial configurations 
is not regular when the size of the secret messages is parameterized. To construct a reg- 
ular model, we allow a configuration to encode secret messages of different sizes, and 
devise the transition system such that an initial configuration w can finish the protocol 
(i.e. can have a trace containing all of the announcements ao,...,@n—1) if and only 
if the messages encoded in w have same size. The resulting PTS is a regular system; 
it over-approximates the PTS of the generalized dining cryptographers protocol in the 
sense that the anonymity property of the former implies that of the latter. 


The Grades Protocol. The grades protocol [29] is a multi-party computation algorithm 
aiming to securely compute the sum of the secrets held by the participants. The setting 
of the protocol is pretty similar to that of the dining cryptographers: given n > 3 and 
g = 2, we have a ring of n participants po, ...,Pn—1 where each participant p; holds a 
secret x; € {0,...,g — 1}. Note that both g and n are parameterized in this protocol. 
The goal of the participants is to compute the sum % + --- + £n—ı Without revealing 
information about the individual secrets. Define M := (g — 1) -n + 1. The protocol 
consists of two steps: (i) Each two adjacent participants pi, p;+1 compute a random 
number y; € {0,..., M — 1}; (ii) Each participant p; announces a; := (aj + yi — 
yi-1) mod M to the other participants. After executing the protocol, the participants 
compute a := ao +--+: + a@,_; mod M. Because of the ring structure, the y;’s will be 
cancelled out in the sum. Thus the value of a will equal to the sum of all secrets. The 
anonymity property of the protocol asserts that no participant can infer the secrets held 
by the other participants from the information she has observed. 

We consider a variant of the grades protocol where M can be any power of two 
greater than (g — 1) - n. Observe that the same anonymity and correctness property of 
the original protocol also holds for this variant. To verify the anonymity property, we 
model an over-approximation of the protocol where the secrets are allowed to range 
over {0,..., M — 1}. This model is similar to the one we have constructed for the gen- 
eralized dining cryptographers protocol except that, e.g., the XOR operations are now 
replaced with bitwise additions and negations. A reference system is specified such that 
the announcements @1,..., @,—1 observed by the first participant po are uniformly dis- 
tributed over the values satisfying ao +--:+@,—; mod M = zo +---+£n-ı mod M. 


* Such a distribution can be obtained by (i) choose a1,...,@n—2 € {0, 1} uniformly at random; 
(ii) set ao = w[0] $ bo ® bn-1; (iii) set an-ı = ao ® - +: Pan-2 ® w[0] $ --- win — 1]. 
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Algorithm 1. Equivalence check for L* 


Input: Candidate automaton H over X x X, PTS G, and relation E C (X x Y)*. 
Result: NoSolution(v,w) if there is no bisimulation R with E C R. 
PositiveCEX (v,w) if H should accept (v, w), but does not; 
NegativeCEX (v, w) if H accepts (v, w), but should not; 
Correct if H is a correct bisimulation for PTS G and E C L(H); 


1 Check whether Æ C £(H), and whether G = &(L(H)) using the & from (3); 
2 if there is a counterexample of minimal length n then 

3 Compute the greatest bisimulation Rn restricted to configurations of length n; 
4 if there is (v Q w) € E \ Rn with |v| = |w| = n then 

5 | Output NoSolution(v, w) and abort; 

6 else if there is (v & w) € L(H) \ Rn with |v| = |w| = n then 

7 | return NegativeCEX (v, w); 

8 else if there is (v @ w) € Rn \ L(H) then 

9 | return PositiveCEX (v, w); 

10 else 

11 return Correct; 


By computing a probabilistic bisimulation between the original system and the refer- 
ence system, we establish the anonymity property that the grades protocol is anonymous 
whenever M is chosen as a power of two with M > (g—1)-n+1. 


6 Learning Probabilistic Bisimulations 


We propose an automata learning method to automatically compute regular probabilis- 
tic bisimulations R, focusing on the case of length-preserving PTSs, which covers all 
examples given in the previous section. The approach uses active automata learning, 
for instance Angluin’s L* method [5] or refinements of it, to compute R. This app- 
roach is inspired by previous work on using active automata learning for invariant 
inference [18,54]. Our procedure assumes (i) as input a bounded-branching PTS G = 
(S; {Oa}aeact), as well as a length-preserving regular relation EF C (X x X)* sup- 
posed to be covered by R; (ii) an effective way to check the correctness of R, i.e., a 
decision procedure in the sense of Theorem 1; and (iii) a procedure to compute the 
greatest probabilistic bisimulation R, C (X x X)” for G restricted to configurations 
of any length n € N. The last assumption can easily be satisfied for length-preserving 
PTSs. Indeed, such systems, restricted to configurations of length n, are finite-state, so 
that efficient existing methods [6, 17,20,52] apply. A solution R is presented as a deter- 
ministic letter-to-letter transducer, i.e., as a deterministic finite-state automaton over the 
alphabet X x X. 

Since L*-style learning requires the taught language to be uniquely defined, our 
approach attempts to learn a representation of the greatest length-preserving proba- 
bilistic bisimulation relation R C (X x X)*, which is the unique bisimulation rela- 
tion formed by the union of all length-preserving probabilistic bisimulations of G, i.e., 
R= Uisi Rn. Because R is not in general computable, the learning process might 
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diverge and fail to produce any probabilistic bisimulation. It can also happen that learn- 
ing terminates, but yields a probabilistic bisimulation relation strictly smaller than R. 
The L* method requires a teacher that is able to answer two kinds of queries: 


— membership queries, i.e., whether a pair (v, w) of words should be accepted by 
the automaton to be learned. Since our learner tries to learn the greatest bisimula- 
tion, the teacher can answer this query by checking whether the configurations v, w 
are bisimilar; this is done by computing the greatest bisimulation Riv restricted to 
configurations of any length |v| = |w|, and checking whether or not (v, w) € Rjy). 

— equivalence queries, i.e., whether a candidate automaton 7 is the correct language 
to be learned. Such queries can essentially be answered by checking whether the 
language £(H) satisfies the formula (R) from (3). The complete algorithm for 
answering equivalence queries is given in Algorithm 1. The algorithm first attempts 
to find a shortest counterexample to the proof rule. If a counterexample of length n is 
found, then the difference set L(H) A R,, must contain at least one pair of length n. 
Any of such pairs is a valid counterexample for automata learning since the learner 
tries to learn the greatest bisimulation. The teacher thus reports one such pair to be 
a positive or negative counterexample according to its membership in Rp. 


Properties of the Learning Algorithm. The learning procedure terminates when the 
teacher outputs NoSolution or returns Correct for an equivalence query. In the for- 
mer case, the teacher explicitly provides a pair of non-bisimilar configurations in E. 
In the latter case, the procedure computes an automaton H such that Æ C £(H) and 
L(H) is a correct probabilistic bisimulation (as it satisfies the proof rule based on The- 
orem 1), though not necessarily the greatest one. Since all counterexamples reported by 
the teacher are contained in L(H) A R, the learning procedure is guaranteed to termi- 
nate for PTSs where the greatest probabilistic bisimulation R is regular. 


Optimization with Inductive Invariants. There is a natural way to optimize the learning 
procedure by only considering a regular inductive invariant Inv such that Inv contains 
the set of reachable configurations and Æ C Inv x Inv. The optimization is done by 
simply replacing the greatest finite-length bisimulations R; in Algorithm 1, and when 
answering membership queries, with the greatest bisimulation R? = R; N Inv on the 
inductive invariant. Since Ri can be a lot smaller than R;, this can lead to significant 
speed-ups. Note that a bisimulation R’ on Inv can be extended to a bisimulation R on 
all configurations by setting R = R’ U {(v, v) : v ¢ Inv}. The inductive invariant Inv 
may be manually specified, or automatically generated using techniques like in [18,54]. 


Experimental Results and Conclusion. We have implemented a prototype in Scala to 
test our learning method. Given a PTS specified over 4, our tool first translates it to 
WSIS formulas and obtains finite automata for these formulas using the Mona tool 
[30]. Our prototype then applies the L* learning procedure as described in this section, 
including the optimization to consider only the configurations of valid format. When 
answering an equivalence query, our tool invokes Mona to verify candidate automata 
and obtain counterexamples (line 1-2 of Algorithm 1). We use the prototype tool to 
prove the anonymity property of the three protocols described in Sect.5. The proofs 
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Table 1. Experimental results. For each case study, we list the size of the final proof produced by 
our tool, the time taken by Mona to verify the candidate automata, the time taken by our tool to 
compute the fixed-length bisimulations, and the total computation time of the learning procedure. 
Experiments are run on a Windows laptop with 2.4 GHz Intel i5 processor and 2GB memory 
limit. 


Case study #states | #trans | Mona | Bisim | Total 
Dining cryptographers, single-bit 13 832 2s 2s 6s 
Dining cryptographers, multi-bit | 16 1024 3s 24s |28s 
The grades protocol 25 1600 5s 28s |35s 


generated by our tool are finite-state automata encoding the desired probabilistic bisim- 
ulation relations. The experimental results are summarized in Table 1. 
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Abstract. Analysis of large continuous-time stochastic systems is a 
computationally intensive task. In this work we focus on population mod- 
els arising from chemical reaction networks (CRNs), which play a funda- 
mental role in analysis and design of biochemical systems. Many relevant 
CRNs are particularly challenging for existing techniques due to complex 
dynamics including stochasticity, stiffness or multimodal population dis- 
tributions. We propose a novel approach allowing not only to predict, 
but also to explain both the transient and steady-state behaviour. It 
focuses on qualitative description of the behaviour and aims at quanti- 
tative precision only in orders of magnitude. First we build a compact 
understandable model, which we then crudely analyse. As demonstrated 
on complex CRNs from literature, our approach reproduces the known 
results, but in contrast to the state-of-the-art methods, it runs with vir- 
tually no computational cost and thus offers unprecedented scalability. 


1 Introduction 


Chemical Reaction Networks (CRNs) are a versatile language widely used for 
modelling and analysis of biochemical systems [12] as well as for high-level pro- 
gramming of molecular devices [8,40]. They provide a compact formalism equiv- 
alent to Petri nets [37], Vector Addition Systems (VAS) [29] and distributed 
population protocols [3]. Motivated by numerous potential applications ranging 
from system biology to synthetic biology, various techniques allowing simulation 
and formal analysis of CRNs have been proposed [2,9,21,24,39], and embodied 
in the design process of biochemical systems [20, 25,32]. The time-evolution of 
CRNS is governed by the Chemical Master Equation (CME), which describes the 
probability of the molecular counts of each chemical species. Many important 
biochemical systems lead to complex dynamics that includes state space explo- 
sion, stochasticity, stiffness, and multimodality of the population distributions 
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[23,44], and that fundamentally limits the class of systems the existing techniques 
can effectively handle. More importantly, biologist and engineers often seek for 
plausible explanations why the system under study has or has not the required 
behaviour. In many cases, a set of system simulations/trajectories or population 
distributions is not sufficient and the ability to provide an accurate explanation 
for the temporal or steady-state behaviour is another major challenge for the 
existing techniques. 

In order to cope with the computational complexity of the analysis and in 
order to obtain explanations of the behaviour, we shift the focus from quanti- 
tatively precise results to a more qualitative analysis, closer to how a human 
would behold the system. Yet we insist on providing at least rough timing infor- 
mation on the behaviour as well as rough classification of probability of differ- 
ent behaviours at the extent of “very likely”, “few percent”, “barely possible”, 
so that we can conclude on issues such as time to extinction or bimodality of 
behaviour. This gives rise to our semi-quantitative approach. We stipulate that 
analyses in this framework reflect quantities in orders of magnitude, both for 
time duration and probabilities, but not more than that. This paradigm shift is 
reflected on two levels: (1) We abstract systems into semi-quantitative models. 
(2) We analyse systems in a semi-quantitative way. While each of the two can 
be combined with a traditional abstraction /analysis, when combined together 
they provide powerful means to understand systems’ behaviour with virtually 
no computational cost. 


Semi-quantitative Models. The states of the models contain information on 
the current amount of objects of each species as an interval spanning often sev- 
eral orders of magnitude, unless instructed otherwise. For instance, if an amount 
of a certain species is to be closely monitored (as a part of the input speci- 
fication/property of the system) then this abstraction can be finer. Similarly, 
whenever the analysis of a previous version of the abstraction points to the lack 
of precision in certain states, preventing us to conclude which of the possible 
behaviours is prevalent, the corresponding refinement can take place. Further, 
the rates of the transitions are also captured only with such imprecision. The 
crucial point allowing for existence of such models that are small, yet faithful, 
is our concept of acceleration. It captures certain sequences of transitions. It 
eliminates most of the non-determinism that paralyses other types of abstrac- 
tions, which are too over-approximative, unable to conclude anything, but safety 
properties. 


Semi-quantitative Analysis. Instead of performing exact transient or steady- 
state analysis, we can consider most probable transitions and then carefully lift 
this to most probable temporal behaviours. Technically, this is done by alter- 
nating between transient and steady-state analysis where only some rates and 
transitions are taken into account at different stages. In order to further facili- 
tate the resulting insight of the human on the result of the analysis, we provide an 
algorithm to perform this analysis with virtually no computation effort and thus 
possibly manually. The trivial computations immediately pinpoint why certain 
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behaviours occur. Moreover, less likely behaviours can also be identified easily, 
to any desired degree of improbability (dozens of percent, promilles etc.). 

To summarise, the first step yields tiny models, allowing for a synoptic obser- 
vation of the model; due to their size these models can be either analysed easily 
using standard means, or can be subject to the second step. The second step 
provides an efficient approximative analysis, which is also very illustrative due 
to the limited use of quantities. It can be applied to any system; however, it is 
particularly interesting in connection with the models coming from the first step 
since (i) no extra effort (size, computation) is wasted on overly precise treatment 
that is ignored by the other step, and (ii) together they yield an understandable 
explanation of the behaviour. An entertaining feature of this paradigm is that 
the stiffer (with rates at hugely different time scales) the system is the easier it 
is to analyse. 

To demonstrate the capabilities of our approach, we consider three chal- 
lenging and biologically relevant case studies that have been used in literature 
to evaluate state-of-the-art methods for the CRN analysis. It has been shown 
that many approaches fail, either due to time-outs or incapability to capture 
differences in behaviours, and some tailored ones require considerable compu- 
tational effort, e.g. an hour of computation. Our experiments clearly show that 
the proposed approach can deliver results that yield qualitatively same informa- 
tion, more understanding and can be computed in minutes by hand (or within 
a fraction of a second by computer). 


Our contribution can be summarized as follows: 


— We propose a novel semi-quantitative framework for analysis of CRN and 
similar population models, focusing on explainability of the results and low 
complexity, with quantitative precision limited to orders of magnitude. 

— An algorithm for abstracting CRNs into semi-quantitative models based on 
interval abstraction of the species population and on transition acceleration. 

— An algorithm for semi-quantitative analysis that replaces exact numerical 
computation by exploring the most probable transitions and alternating tran- 
sient and steady-state analysis. 

— We consider three challenging CRNs thoroughly studied in literature and 
demonstrate that the semi-quantitative abstraction and analysis gives us a 
unique tool that is able to accurately predict and explain both transient and 
steady-state behaviour of complex CRNs in a fraction of a second. 


Related Work 


To the best of our knowledge, there does not exist any abstraction of CRNs 
similar to the proposed approach. Indeed, there exist various abstraction and 
approximation schemes for CRNs that improve the performance and scalability 
of both the simulation-based and the numerical-based techniques. In the fol- 
lowing paragraphs, we discuss the most relevant directions and the links to our 
approach. 
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Approximate Semantics for CRNs. For CRNs including large populations 
of species, fluid (mean-field) approximation techniques can be applied [5] and 
extended to approximate higher-order moments [15]: these deterministic approx- 
imations lead to a set of ordinary differential equations (ODEs). An alternative 
is to approximate the CME as a continuous-state stochastic process. The Linear 
Noise Approximation (LNA) is a Gaussian process which has been derived as an 
approximation of the CME [16,44] and describes the time evolution of expec- 
tation and variance of the species in terms of ODEs. Recently, an aggregation 
scheme over ODEs that aims at understanding the dynamics of large CRNs has 
been proposed in [10]. In contrast to our approach, the deterministic approx- 
imations cannot adequately capture the stochasticity of CRNs caused by low 
population species. 

To mitigate this drawback, various hybrid models have been proposed. The 
common idea of these models is as follows: the dynamics of low population species 
is described by the discrete stochastic process and the dynamics of large pop- 
ulation species is approximated by a continuous process. The particular hybrid 
models differ in the approximation of the large population species. In [27], a pure 
deterministic semantics for large population species is used. The moment-based 
description for medium/high-copy number species was used in [24]. The LNA 
approximation and an adaptive partitioning of the species according to leap con- 
ditions (that is more general than partitioning based on population thresholds) 
was proposed in [9]. All hybrid models have to deal with interactions between 
low and large population species. In particular, the dynamics of the stochastic 
process describing the low-population species is conditioned by the continuous- 
state describing the concentration of the large-population species. The numeri- 
cal analysis of such conditioned stochastic process is typically a computationally 
demanding task that limits the scalability. 

In contrast, our approach does not explicitly partition the species, but rather 
abstracts the concrete species population using an interval abstraction and tries 
to effectively capture both the stochastic and the deterministic behaviour with 
the help of the accelerated transitions. As we already emphasised, the proposed 
abstraction and analysis avoids any numerical computation of precise quantities. 


Reduction Techniques for Stochastic Models. A widely studied reduc- 
tion method for Markov models is state aggregation based on lumping [6] or 
(bi-)simulation equivalence [4], with the latter notion in its exact [33] or approx- 
imate [13] form. Approximate notions of equivalence have led to new abstrac- 
tion/refinement techniques for the numerical verification of Markov models over 
finite [14] as well as uncountably-infinite state spaces [1,41,42]. Several approx- 
imate aggregation schemes leveraging the structural properties of CRNs were 
proposed [17,34,45]. Abate et al. proposed an adaptive aggregation that gives 
formal guarantees on the approximation error, but typically provide lower state 
space reductions [2]. Our approach shares the idea of abstracting the state space 
by aggregating some states together. Similarly to [17,34,45], we partition the 
state space based on the species population, i.e. we also introduce the popula- 
tion levels. In contrast to the aforementioned aggregation schemes, we propose a 
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novel abstraction of the transition relation based on the acceleration. It allows us 
to avoid the numerical solution of the approximate CME and thus achieve a bet- 
ter reduction while providing an accurate predication of the system behaviour. 

Alternative methods to deal with large/infinite state spaces are based on a 
state truncation trying to eliminate insignificant states, i.e., states reached only 
with a negligible probability. These methods, including finite state projections 
[36], sliding window abstractions [26], or fast adaptive uniformisation [35], are 
able to quantify the total probability mass that is lost due to the truncation, 
but typically cannot effectively handle systems involving a stiff behaviour and 
multimodality [9]. 


Simulation-Based Analysis. Transient analysis of CRNs can be performed 
using the Stochastic Simulation Algorithm (SSA) [21]. Note that the SSA 
produces a single realisation of the stochastic process, whereas the stochastic 
solution of CME gives the probability distribution of each species over time. 
Although simulation-based analysis is generally faster than direct solution of the 
stochastic process underlying the given CRN, obtaining good accuracy necessi- 
tates potentially large numbers of simulations and can be very time consuming. 

Various partitioning schemes for species and reactions have been proposed 
for the purpose of speeding up the SSA in multi-scale systems [23,38,39]. For 
instance, Yao et al. introduced the slow-scale SSA [7], where they distinguish 
between fast and slow species. Fast species are then treated assuming they reach 
equilibrium much faster than the slow ones. Adaptive partitioning of the species 
has been considered in [19,28]. In contrast to the simulation-based analysis, our 
approach (i) provides a compact explanation of the system behaviour in the form 
of tiny models allowing for a synoptic observation and (ii) can easily reveal less 
probable behaviours. 


2 Chemical Reaction Networks 


In this paper, we assume familiarity with standard verification of (continuous- 
time) probabilistic systems, e.g. [4]. For more detail, see [11, Appendix]. 


CRN Syntaz. A chemical reaction network (CRN) N = (A,R) is a pair of finite 
sets, where A is a set of species, |A| denotes its size, and R is a set of reactions. 
Species in A interact according to the reactions in R. A reaction T € R is a 
triple T = (r+, pr, k+), where r, € N4! is the reactant complex, p, € NI^ is the 
product complex and k, € Ryo is the coefficient associated with the rate of the 
reaction. r+ and p, represent the stoichiometry of reactants and products. Given 


a reaction 71 = ([1, 1,0], [0,0, 2], k1), we often refer to it as 71 : A1 + A2 L hs, 


CRN Semantics. Under the usual assumption of mass action kinetics, the 
stochastic semantics of a CRN M is generally given in terms of a discrete-state, 
continuous-time stochastic process X(t) = (X1 (t), X2(t),..., Xj\(t), t = 0) [16]. 
The state change associated to the reaction 7 is defined by v+ = p+ — rz, i.e. the 
state X is changed to X’ = X + v,, which we denote as X 5 X’. For example, 
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for Tı as above, we have v, = [—1,—1, 2]. For a reaction to happen in a state X, 
all reactants have to be in sufficient numbers. The reachable state space of X(t), 
denoted as S, is the set of all states reachable by a sequence of reactions from 
a given initial state Xo. The set of reactions changing the state X; to the state 
X; is denoted as reac(X;, X4) = {7 | X; > X;}. 

The behaviour of the stochastic system X(t) can be described by the (possi- 
bly infinite) continuous-time Markov chain (CTMC) (N) = (S, Xo, R) where 
the transition matrix R(i, j) gives the probability of a transition from X; to X;. 
Formally, 


N 
X. 
RU)= be Gra where C= TC) Œ 


T€reac(X;,X;) l=1 


corresponds to the population dependent term of the propensity function where 
X; ¿is Lth component of the state X; and ry is the stoichiometric coefficient of the 
é-th reactant in the reaction 7. The CTMC q(N) is the accurate representation 
of CRN N, but—even when finite—not scalable in practice because of the state 
space explosion problem [25,31]. 


3 Semi-quantitative Abstraction 


In this section, we describe our abstraction. We derive the desired CTMC con- 
ceptually in several steps, which we describe explicitly, although we implement 
the construction of the final system directly from the initial CRN. 


3.1 Over-Approximation by Interval Abstraction and Acceleration 


Given a CRN N = (A, R), we first consider an interval continuous-time Markov 
decision process (interval CTMDP‘), which is a finite abstraction of the infi- 
nite ~(M). Intuitively, abstract states are given by intervals on sizes of popu- 
lations with an additional specific that the abstraction captures enabledness of 
reactions. The transition structure follows the ideas of the standard may abstrac- 
tion and of the three-valued abstraction of continuous-time systems [30]. A tech- 
nical difference in the latter point is that we abstract rates into intervals instead 
of uniformising the chain and then only abstracting transition probabilities into 
intervals; this is necessary in later stages of the process. The main difference is 
that we also treat certain sequences of actions, which we call acceleration. 


Abstract Domains. The first step is to define the abstract domain for the 
population sizes. For every species A € A, we define a finite partitioning A) of 
N into intervals, reflecting the rough size of the population. Moreover, we want 
the abstraction to reflect whether a reaction is enabled. Hence we require that 


1 Interval CTMDP is a CTMDP with lower/upper bounds on rates. Since it serves only 
as an intermediate formalism to ease the presentation, we refrain from formalising 
it here. 
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{0} € A) for the case when the coefficients of this species as a reactant is always 
0 or 1; in general, for every i < max;err,(A) we require {i} € A). 

The abstraction a(n) of a number n of a species A is then the I € A) for 
which n € I. The state space of a(M) is the product [<4 A, of the abstract 
domains with the point-wise defined abstraction a(n), = a)(n,). 

The abstract domain for the rates according to (R) is the set of all real 
intervals. 

Transitions from an abstract state are defined as the may abstraction as 
follows. Since our abstraction reflect enabledness, the same set of action is 
enabled in all concrete states of a given abstract state. The targets of the action 
in the abstract setting are abstractions of all possible concrete successors, i.e. 
succ(s,a) := {a(n) | m € s,m + n}, in other words, the transitions enabled in 
at least one of the respective concrete states. The abstract rate is the smallest 
interval including all the concrete rates of the respective concrete transitions. 
This can be easily computed by the corner-points abstraction (evaluating only 
the extremum values for each species) since the stoichiometry of the rates is 
monotone in the population sizes. 


High-Level of Non-determinism. The (more or less) standard style of the 
abstraction above has several drawbacks—mostly related to the high degree of 
non-determinism for rates—which we will subsequently discuss. 

Firstly, in connection with the abstract population sizes, transitions to dif- 
ferent sizes only happen non-deterministically, leaving us unable to determine 
which behaviour is probable. For example, consider the simple system given by 
\ Š Ø with ka = 10~* so the degradation happens on average each 104 seconds. 
Assume population discretisation into [0], [1..5], [6..20], [21..00) with abstraction 
depicted in Fig. 1. While the original system obviously moves from [6..20] to 
[1..5] very probably in less than 15-10+ seconds, the abstraction cannot even say 
that it happens, not to speak of estimating the time. 


d, 10+ d, 6 - 104 d, 21-104 
[0] [1..5 [6..20] k [21, 00) 
,[2- 104,5- 10°] Va, [7 - 104,20 : 104] Va, [22 - 104, 00) 


d, .44- 10+ d, [.76 - 104,6 - 104 d, (0,21 - 104 
ae a (6..20] \ l [21, 00) 


Fig. 1. Above: Interval CTMDP abstraction with intervals on rates and non- 
determinism. Below: Interval CTMC abstraction arising from acceleration. 


Q 


Acceleration. To address this issue, we drop the non-deterministic self-loops 
and transitions to higher/lower populations in the abstract system.” Instead, 


? One can also preserve the non-determinism for the special case when one of the 
transitions leads to a state where some action ceases to be enabled. While this adds 
more precision, the non-determinism in the abstraction makes it less convenient to 
handle. 
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we “accelerate” their effect: We consider sequences of these actions that in the 
concrete system have the effect of changing the population level. In our example 
above, we need to take the transition 1 to 13 times from [6..20] with various 
rates depending on the current concrete population, in order to get to [1..5]. 
This makes the precise timing more complicated to compute. Nevertheless, the 
expected time can be approximated easily: here it ranges from z -10 = 0.17 -104 
(for population 6) to roughly (5 ++- :+4)-10* = 1.3-104 (for population 20). 
This results in an interval CTMC.° 


Concurrency in Acceleration. The accelerated transitions can due to higher 
number of occurrences be considered continuous or deterministic, as opposed to 
discrete stochastic changes as distinguished in the hybrid approach. The usual 
differential equation approach would also take into account other reactions that 
are modelled deterministically and would combine their effect into one equation. 
In order to simplify the exposition and computation and—as we see later— 
without much loss of precision, we can consider only the fastest change (or 
non-deterministically more of them if their rates are similar).4 


3.2 Operational Semantics: Concretisation to a Representative 


The next disadvantage of classical abstraction philosophy, manifested in the 
interval CTMC above is that the precise-valued intervals on rates imply high 
computational effort during the analysis. Although the system is smaller, stan- 
dard transient analysis is still quite expensive. 


Concretisation. In order to deal with this issue, the interval can be approxi- 
mated roughly by the expected time it would take for an average population in 
the considered range, in our example the “average” representative is 13. Then 
the first transition occurs with rate 13 - 1074 = 107° and needs to happen 7 
times, yielding expected time 7/13 - 104 = 0.5 - 10+ (ignoring even the precise 
slow downs in the rates as the population decreases). Already this very rough 
computation yields relative precision with factor 3 for all the populations in this 
interval, thus yielding the correct order of magnitude with virtually no effort. 
We lift the concretisation naturally to states and denote the concretisation of 
abstract state s by y(s). The complete procedure is depicted in Algorithm 1. 
The concretisation is one of the main points where we deliberately drop a 
lot of quantitative information, while still preserving some to conclude on big 
quantitative differences. Of course, the precision improves with more precise 
abstract domains and also with higher differences on the original rates. 


3 The waiting times are not distributed according to the rates in the intervals. It is only 
the expected waiting time (reciprocal of the rate) that is preserved. Nevertheless, for 
ease of exposition, instead of labelling the transitions with expected waiting times 
we stick to the CTMC style with the reciprocals and formally treat it as if the label 
was a real rate. 

4 Typically the classical concurrency diamond appears and the effect of the other 
accelerated reactions happen just after the first one. 
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Algorithm 1. Semi-quantitative abstraction CTMC a(N) 


1: A~J[]ye, An > States 
2: for a € Ado > Transitions 
3: c — (a) > Concrete representative 
4: for each T enabled in c do 

5: r rate of T inc > According to (R) 
6: a’ — a(c + v+) > Successor 
T: set a — a’ with rate r 

8: for self-loop a > a do > Accelerate self-loops 
9: nr —min{n|a(e+n-v;,) Aa} p the number of 7 to change the abstract state 
10: a’ — a(c + nr vr) > Acceleration successor 
11: instead of the self-loop with rate r, set a +, a’ with rate nr- r 


It remains to determine the representative for the unbounded interval. In 
order to avoid infinity, we require an additional input for the analysis, which are 
deemed upper bounds on possible population of each species. In cases when any 
upper bound is hard to assume, we can analyse the system with a random one 
and see if the last interval is reachable with significant probability. If yes, then 
we need to use this upper bound as a new point in the interval partitioning and 
try a higher upper bound next time. In general, such conditions can be checked 
in the abstraction and their violation implies a recommendation to refine the 
abstract domains accordingly. 


Orders-of-Magnitude Abstraction. Such an approximation is thus sufficient 
to determine most of the time whether the acceleration (sequence of actions) 
happens sooner or later than e.g. another reaction with rate 107° or 1072. Note 
that this decision gets more precise not only as we refine the population levels, 
but also as the system gets stiffer (the concrete values of the rates differ more), 
which are normally harder to analyse. For the ease of presentation in our case 
studies, we shall depict only the magnitude of the rates, i.e. the decadic logarithm 
rounded to an integer. 


Non-determinism and Refinement. If two rates are close to each other, say 
of the same magnitude (or difference 1), such a rough computation (and rough 
population discretisation) is not precise enough to determine which of the reac- 
tions happens with high probability sooner. Both may be happening roughly at 
the same pace, or with more information we could conclude one of them is con- 
siderably faster. This introduces an uncertainty, showing different behaviours are 
possible depending on the exact quantities. This indicates points where refine- 
ment might be needed if more precise results are required. For instance, with 
rates of magnitudes 2 and 3, the latter should be happing most of the time, the 
former only with a few percent chance. If we want to know whether it is rather 
tens of percent or tenths of percent, we should refine the abstraction. 
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4 Semi-quantitative Analysis 


In this section, we present an approximative analysis technique that describes 
the most probable transient and steady-state behaviour of the system (also with 
rough timing) and on demand also the (one or more orders of magnitude) less 
probable behaviours. As such it is robust in the sense that it is well suited to work 
with imprecise rates and populations. It is computationally easy (can be done 
in hand in time required for a computer by other methods), while still yielding 
significant quantitative results (“in orders of magnitude”). It does not provide 
exact error guarantees since computing them would be almost as expensive as 
the classical analysis. It only features trivial limit-style bounds: if the population 
abstraction gets more and more refined, the probabilities converge to those of the 
original system; further, the higher the separation between the rate magnitudes, 
the more precise the approximation is since the other factors (and thus the 
incurred imprecisions) play less significant role. 

Intuitively, the main idea—similar to some multi-rate simulation techniques 
for stiff systems—is to “simulate” “fast” reactions until the steady state and 
then examine which slower reactions take place. However, “fast” does not mean 
faster than some constant, but faster than other transitions in a given state. 
In other words, we are not distinguishing fast and slow reactions, but tailor 
this to each state separately. Further, “simulation” is not really a stochastic 
simulation, but a deterministic choice of the fastest available transition. If a 
transition is significantly faster than others then this yields what a simulation 
would yield. When there are transitions with similar rates, e.g. with at most one 
order of magnitude difference, then both are taken into account as described in 
the following definition. 


Pruned System. Consider the underlying graph of the given CTMC. If we keep 
only the outgoing transitions with the maximum rate in each state, we call the 
result pruned. If there is always (at most) one transition then the graph consists 
of several paths leading to cycles. In general when more transitions are kept, it 
has bottom strongly connected components (bottom SCCs, BSCCs) and some 
transient parts. 

We generalise this concept to n-pruning that preserves all transitions with 
a rate that is not more than n orders of magnitude smaller than the maximum 
rate in the state. Then the pruning above is 0-pruning, 1-pruning preserves also 
transitions happening up to 10 times slower, which can thus still happen with 
dozens of percent, 2-pruning is relevant for analysis where behaviour occurring 
with units of percent is also tracked etc. 


Algorithm Idea. Here we explain the idea of Algorithm 2. The transient parts 
of the pruned system describe the most probable behaviour from each state until 
the point where visited states start to repeat a lot (steady state of the pruned 
system). In the original system, the usual behaviour is then to stay in this SCC 
C until one of the pruned (slower) reactions occurs, say from state s to state t. 
This may bring us to a different component of the pruned graph and the analysis 
process repeats. However, t may also bring us back into C, in which case we stay 
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in the steady-state, which is basically the same as without the transition from 
s to t. Further, t might be in the transient part leading to C, in which case 
these states are added to C and the steady state changes a bit, spreading the 
distribution slightly also to the previously transient states. Finally, t might be 
leading us into a component D where this run was previous to visiting C. In 
that case, the steady-state distribution spreads over all the components visited 
between D and C, putting a probability mass to each with a different order of 
magnitude depending on all the (magnitudes of) sojourn times in the transient 
and steady-state phases on the way. 

Using the macros defined in the algorithm, the correctness of the compu- 
tations can be shown as follows. For the time spent in the transient phase 
(line 16), we consider the slowest sojourn time on the way times the number 
of such transitions; this is accurate since the other times are by order(s) of mag- 
nitude shorter, hence negligible. The steady-state distribution on a BSCC of the 


Algorithm 2. Semi-quantitative analysis 


1: W- > worklist of SCCs to process 
2: add {initial state} to W and assign iteration 0 to it p> artificial SCC to start the process 
3: while W 40 do 


4: C —pop W 
> Compute and output steady state or its approximation 
5: steady-state of C is approximately minStayingRate/(m - stayingRate(-)) 
6: if C has no exits then continue > definitely bottom SCC, final steady state 
> Compute and output exiting transitions and the time spent in C 
7 exitStates — arg minc (stayingRate(-)/exitingRate(-)) > Probable exit points 
8: minStayingRate —minimum rate in C, m —#occurrences there 
9 timeToExit — stayingRate(s) - m/(|exitStates| - minStayingRate - exitingRate(s)) 
for (arbitrary) s E€ exitStates 
10: for all s € ezitsStates do > Transient analysis 
11: t target of the exiting transition 
12: T —SCCs reachable in the pruned graph from t 
13: thereby newly reached transitions get assigned iteration of C + 1 
14: for D € T do 
> Compute and output time to get from t to D 
15: minRate —minimum rate on the way from t to D, m —#occurrences there 
16: transTime — m/minRate 
> Determine the new SCC 
17: if D = C then > back to the current SCC 
18: add to W the union of C and the new transient path 7 from t to C 
19: in later steady-state computation, the states of 7 will have probability 
smaller by a factor of stayingRate(s) /exitingRate(s) 
20: else if D was previously visited then p> alternating between different SCCs 
21: add to W the merge of all SCCs visited between D and C (inclusively) 
22: in later steady-state computation, reflect all timeToExit and transTime 
between D and C 
23: else > new SCC 
24: add D to W 
MACROS: 


stayingRate(s) is the rate of transitions from s in the pruned graph 
exitingRate(s) is the maximum rate of transitions from s not in the pruned graph 
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pruned graph can be approximated by the minStayingRate/(m - stayingRate(-)) 
on line 5. Indeed, it corresponds to the steady-state distribution if the BSCC is a 
cycle and the minStayingRate significantly larger than other rates in the BSCC 
since then the return time for the states is approximately m/minStayingRate 
and the sojourn time 1/stayingRate(-). The component is exited from s with 
the proportion given by its steady-state distribution times the probability to 
take the exit during that time. The former is approximated above; the latter 
can be approximated by the density in 0, i.e. by exitingRate(s), since the stay- 
ing rate is significantly faster. Hence the candidates for exiting are maximising 
exitingRate(-)/stayingRate(-) as on line 7. There are |exitStates| candidates for 
exit and the time to exit the component by a particular candidate s is the 
expected number of visits before exit, i.e. stayingRate(s) - exitingRate(s) times 
the return time m- minStayingRate, hence the expression on line 9. 


a | 100 1 
t 1 | So | 1 Sı wy S2 | 10 s BOES 
(Ha HaT es 
“i. “10 


Fig. 2. Alternating transient and steady-state analysis. 


For example, consider the system in Fig.2. Iteration 1 reveals the part 
with solid lines with two (temporary) BSCCs {t} and {s1,s52,53}. The for- 
mer turns out definitely bottom. The latter has a steady state proportional to 
(10-',10~', 1007+). Its most probable exits are the dashed ones, identified in the 
subsequent iteration 2, probable proportionally to (1/10,10/100); the expected 
time to take them is 10-2/(2-10-1) =1=100-2/(2- 10-10). The latter leads 
back to the current SCC and does not change the set of BSCCs (hence in our 
examples below we often either skip or merge such iterations for the sake of read- 
ability). In contrast, the former leads to a previous SCC; thereafter {s1, 52, 53} is 
no more a bottom SCC and consequently the third exit to u is not even analysed. 
Nevertheless, it could still happen with minor probability, which can be seen if 
we consider 1-pruning instead. 


5 Experimental Evaluation and Discussion 


In order to demonstrate the applicability and accuracy of our approach, we 
selected the following three biologically relevant case studies. (1) stochastic 
model of gene expression [22,24], (2) Goutsias’s model [23] describing transcrip- 
tion regulation of a repressor protein in bacteriophage À and (3) viral infection 
model [43]. 

Although the underlying CRNs are quite small (up to 5 species and 10 reac- 
tion), their analysis is very challenging: (i) the stochasticity has a strong impact 
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on the dynamics of these systems and thus purely deterministic approximations 
via ODEs are not accurate, (ii) the systems include species with low, medium, 
and high populations and thus the resulting state space of the stochastic process 
is prohibitively large to perform precise numerical analysis and existing reduc- 
tion/approximation techniques are not sufficient (they are either too imprecise 
or do not provide sufficient reduction factors), and (iii) the system dynamics 
leads to bi-modal distributions and/or is affected by stiff reactions. 

These models thus represent perfect candidates for evaluating advanced 
approximation methods including various hybrid approaches [9, 24,27]. Although 
these approaches can handle the models, they typically require tens of minutes 
or hours of computation time. Similarly simulation-based methods are very time 
consuming especially in case of very stiff CRN, represented by the viral infection 
model. We demonstrate that our approach provides accurate predications of the 
system behaviour and is feasible even when performed manually by a human. 

Recall that the algorithm that builds the abstract model of the given CRN 
takes as input two vectors representing the population discretisation and pop- 
ulation bounds. We generally assume that these inputs are provided by users 
who have a priori knowledge about the system (e.g. in which orders the species 
population occurs) and that the inputs also reflect the level of details the users 
are interested in. In the following case studies, we, however, set the inputs only 
based on the rate orders of the reactions affecting the particular species (unless 
mentioned otherwise). 


5.1 Gene Expression Model 


The CRN underlying the gene expression model is described in Table 1. As dis- 
cussed in [24] and experimentally observed in [18], the system oscillates between 
two phases characterised by the Don state and the Dog state, respectively. Biol- 
ogists are interested in how the distribution of the Don and Dog states is aligned 
with the distribution of RNA and proteins P, and how the correlation among 
the distributions depends on the DNA switching rates. 

The state vector of the underlying CTMC is given as [P, RNA, Dog, Don]. We 
use very relaxed bounds on the maximal populations, namely the bound 1000 
for P and 100 for RNA. Note the DNA invariant Don + Dog = 1. As in [24], the 
initial state is given as [10,4,1,0]. 

We first consider the slow switching rates that lead to a more compli- 
cated dynamics including bimodal distributions. In order to demonstrate the 
refinement step and its effect on the accuracy of the model, we start with a 
very coarse abstraction. It distinguishes only the zero population and the non- 
zero populations and thus it is not able to adequately capture the relationship 
between the DNA state and RNA/P population. The pruned abstract model 
obtained using Algorithm 1 and 2 is depicted in Fig. 3 (left). The full one before 
pruning is shown in Fig.6 [11, Appendix]. 

The proposed analysis of the model identifies the key trends in the system 
dynamic. The red transitions, representing iterations 1-3, capture the most prob- 
able paths in the system. The green component includes states with DNA on 


488 M. Češka and J. Křetínský 


Table 1. Gene expression. For slow DNA switching, rı = r2 = 0.05. For fast DNA 
switching, rı = r2 = 1. The rates are in ht. 


Dor 25 Don Don > Dor De 48 Don + RNA RNA + @ 
RNA + RNA+P P40 Pei] P + Don 


[2, 1, 1, 0] 


[<pd, 1>, <Apd, 1>] 


<a, -2> 


[0, 0, 0, 1] 


[0, 0, 1, 0] "e | 12,0,0, 11 


<de, -2>",, <rs, 2> 
“gerd, 1>, KArd, 1> 
(0, 1,0, 1] [0, 2, 0, 1] WLAN 
[<pd, 1>,%®Apa, 1>]| <ps,3> fard, 1> 
hs, 2> |<pd, 1>] <Apd, 1>1 
<Ars, 1> Aps, lss> 


[<pd, 1>, <Apd, 1>] f<rs, 2> terai SAAS] [<pd, 1>, <Apd, 1>] 


<Apd, 1 


(1, 0, 1, 0] (0, 1, 0, 1] [1, 0, 1, 0] 


[<rd, 1>] <Ard, 1>] pba 


<de, -2> 
ees eto 111,001 


rd, 1>,J<Ard, 1>] 
[<pd, 1>, <Ap, 1>] 


[1, 0, 0, 1] 


Jers, 2> LLON æa 1,2,0,1] (2, 2,0, 


[<pd, 1>/<Apd, 1>] 


Fig. 3. Pruned abstraction for the gene expression model using the coarse population 
discretisation (left) and after the refinement (right). The state vector is [P, RNA, Dog, 
Don]. 


(i.e. Don = 1) where the system oscillates. The component is reached via the 
blue state with Dog and no RNAs/P. The blue state is promptly reached from 
the initial state and then the system waits (roughly 100h according our rate 
abstraction) for the next DNA activation. The oscillation is left via a deactiva- 
tion in the iteration 4 (the blue dotted transition)”. The estimation of the exit 
time computed using Algorithm 2 is also 100 h. The deactivation is then followed 
by fast red transitions leading to the blue state, where the system waits for the 
next activation. Therefore, we obtain an oscillation between the blue state and 
the green component, representing the expected oscillation between the Do, and 
Dog states. 

As expected, this abstraction does not clearly predict the bimodal distri- 
bution on the RNA/P populations as the trivial population levels do not bear 
any information beside reaction enabledness. In order to obtain a more accurate 
analysis of the system, we refine the population discretisation using a single level 
threshold for P and DNA, that is equal to 100 and 10, respectively (the rates in 
the CRN indicate that the population of P reaches higher values). 

Figure 3 (right) depicts the pruned abstract model with the new discretisa- 
tion (the full model is depicted in Fig. 7 [11, Appendix]. We again obtain the 
oscillation between the green component representing DNA >, states and the 
blue DNAog state. The states in the green component more accurately predicts 


5 In Fig.3, the dotted transitions denote exit transitions representing the deactiva- 
tions. 
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that in the DNAgn states the populations of RNA and P are high and drop 
to zero only for short time periods. The figure also shows orange transitions 
within the iteration 2 that extend the green component by two states. Note that 
the system promptly returns from these states back to the green component. 
After the deactivation in the iteration 4, the system takes (within the same 
iteration) the fast transitions (solid blue) leading to the blue component where 
system waits for another activation and where the mRNA/protein populations 
decrease. The expected time spent in states on blue solid transitions is small and 
thus we can reliably predict the bimodal distribution of the mRNA/P popula- 
tions and its correlation with the DNA state. The refined abstraction also reveals 
that the switching time from the DNAon mode to the DNAog mode is lower. 
These predications are in accordance with the results obtained in [24]. See Fig. 8 
[11, Appendix] that is adopted from [24] and illustrates these results. 

To further test the accuracy of our approach, we consider the fast switching 
between the DNA states. We follow the study in [24] and increase the rates by 
two orders of magnitude. We use the refined population discretisation and obtain 
a very similar abstraction as in Fig.3 (right). We again obtain the oscillation 
between the green component (DNAn states and nonzero RNA/protein popu- 
lations) and the blue state (DNAog and zero RNA/protein populations). The 
only difference is in fact the transition rates corresponding to the activation and 
deactivation causing that the switching rate between the components is much 
faster. As a consequence, the system spends a longer period in the blue transient 
states with Dog and nonzero RNA/protein populations. The time spent in these 
states decreases the correlation between the DNA state and the RNA/protein 
populations as well as the bimodality in the population distribution. This is 
again in the accordance with [24]. 

To conclude this case study, we observe a very aligned agreement between the 
results obtained using our approach and results in [24] obtained via advanced 
and time consuming numerical methods. We would like to emphasise that our 
abstraction and its solution is obtained within a fraction of a second while the 
numerical methods have to approximate solutions of equations describing high- 
order conditional moments of the population distributions. As [24] does not 
report the runtime of the analysis and the implementation of their methods is 
not publicly available, we cannot directly compare the time complexity. 


5.2 Goutsias’s Model 


Goutsias’s model illustrated in Table2 is widely used for evaluation of various 
numerical and simulation based techniques. As showed e.g. in [23], the system 
has with a high probability the following transient behaviour. In the first phase, 
the system switches with a high rate between the non-active DNA (denoted 
DNA) and the active DNA (DNA.D). During this phase the population of RNA, 
monomers (M) and dimers (D) gradually increase (with only negligible oscilla- 
tions). After around 15 min, the DNA is blocked (DNA.2D) and the population 
of RNA decreases while the population of M and D is relatively stable. After 
all RNA degrades (around another 15min) the system switches to the third 


490 M. Češka and J. Křetínský 


Table 2. Goutsias’ Model. The rates are in s7 


—4 =3 
RNA 0.043 RNA +M M 7x10 0 RNA 4x10 o 
DNA + D 22, DNA.D DNA.D 248 DNA +D 
DNA.D + D 22-5 DNA.2D M+M 2°83, p DŠ M+M 
9x107 0.072 


DNA.2D 2X32", DNA.D + D DNA.D ——> RNA + DNA.D 


[1, 1, 1, 0, 0] 
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i <a, 0> I <de, -1> 
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il; 


Fig. 4. Pruned abstraction for the Goutsias’ model. The state vector is [M + D, RNA, 
DNA, DNA.D, DNA.2D] 


[1, 0, 0, 0, 1] [<dp, -4>, <Adp, -4>] 


phase where the population of M and D slowly decreases. Further, there is a 
non-negligible probability that the DNA is blocked at the beginning while the 
population of RNA is still small and the system promptly dies out. 

Although the system is quite suitable for the hybrid approaches (there is 
no strong bimodality and only a limited stiffness), the analysis still takes 10 
to 50min depending on the required precision [27]. We demonstrate that our 
approach is able to accurately predict the main transient behaviour as well as 
the non-negligible probability that the system promptly dies out. 

The state vector is given as [M, D, RNA, DNA, DNA.D, DNA.2D] and the 
initial state is set to [2, 6, 0, 1, 0, O] as in [27]. We start our analysis with a 
coarse population discretisation with a single threshold 100 for M and D and a 
single threshold 10 for RNA. We relax the bounds, in particular, 1000 for M and 
D, and 100 for RNA. Note that these numbers were selected solely based on the 
rate orders of the relevant reactions. Note the DNA invariant DNA + DNA.D 
+ DNA.2D = 1. 

Figure 4 illustrates the pruned abstract model we obtained (the full model 
is depicted in Fig.9 [11, Appendix]. For a better visualisation, we merged the 
state components corresponding to M and D into one component with M +D. 
As there is the fast reversible dimerisation, the actual distributions between the 
population of M and D does not affect the transient behaviour we are inter- 
ested in. 

The analysis of the model shows the following transient behaviour. The pur- 
ple dotted loop in the iteration i1 represents (de-)activation of the DNA. The 
expected exit time of this loop is 100 s. According to our abstraction, there are 
two options (with the same probability) to exit the loop: (1) the path a rep- 
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resents the DNA blocking followed by the quick extinction and (2) the path b 
corresponds to the production of RNA and its followed by the red loop in the 
i2 that again represents (de-)activation of the DNA. Note that according our 
abstraction, this loop contains states with the populations of M/D as well as 
RNA up to 100 and 10, respectively. 

The expected exit time of this loop is again 100 s and there are two options 
how to leave the loop: (1) the path within the iteration i3 (taken with roughly 
90%) represents again the DNA blocking and it is followed by the extension of 
RNA and consequently by the extension of M/D in about 1000s and (2) the 
path within the iteration 5 (shown in the full graph in Fig.9 [11, Appendix]) 
taken with roughly 10% represents the series of protein productions and leads 
to the states with a high number of proteins (above 100 in our population dis- 
cretisation). Afterwards, there is again a series of DNA (de-)activations followed 
by the DNA blocking and the extinction of RNA. As before, this leads to the 
extinction of M/D in about 1000s. 

Although this abstraction already shows the transient behaviour leading 
to the extinction in about 30min, it introduces the following inaccuracy with 
respect to the known behaviour: (1) the probability of the fast extinction is 
higher and (2) we do not observe the clear bell-shape pattern on the RNA (i.e. 
the level 2 for the RNA is not reached in the abstraction). As in the previous 
case study, the problem is that the population discretisation is too coarse. It 
causes that the total rate of the DNA blocking (affected by the M/D population 
via the mass action kinetics) is too high in the states with the M/D population 
level 1. This can be directly seen in the interval CTMC representation where 
the rate spans many orders of magnitude, incurring too much imprecision. The 
refinement of the M/D population discretisation eliminates the first inaccuracy. 
To obtain the clear bell-shape patter on RNA, one has to refine also the RNA 
population discretisation. 


5.3 Viral Infection 


The viral infection model described in Table 3 represents the most challenging 
system we consider. It is highly stochastic, extremely stiff, with all species pre- 
senting high variance and some also very high molecular populations. Moreover, 
there is a bimodal distribution on the RNA population. As a consequence, the 
solution of the full CME, even using advanced reduction and aggregation tech- 
niques, is prohibitive due to state-space explosion and stochastic simulation are 
very time consuming. State-of-the-art hybrid approaches integrating the LNA 
and an adaptive population partitioning [9] can handle this system but also 
need a very long execution time. For example, a transient analysis up to time 
t = 50 requires around 20 min and up to t = 200 more than an hour. 

To evaluate the accuracy of our approach on this challenging model, we also 
focus on the same transient analysis, namely, we are interested in the distribution 
of RNA at time t = 200. The analysis in [9] predicts a bimodal distribution where, 
the probability that RNA is zero in around 20% and the remaining probability 
has Gaussian distribution with mean around 17 and the probability that there 
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Table 3. Viral Infection. The rates are day~' 
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Fig. 5. Pruned abstraction for the viral infection model. The state vector is [P, RNA, 
DNAJ. 


is more than 30 RNAs is close to zero. This is confirmed by simulation-based 
analysis in [23] showing also the gradual growth of the RNA population. The 
simulation-based analysis in [43], however, estimates a lower probability (around 
3%) that RNA is 0 and higher mean of the remaining Gaussian distribution 
(around 23). Recall that obtaining accurate results using simulations is extremely 
time consuming due to very stiff reactions (a single simulation for t = 200 takes 
around 20s). 

In the final experiments, we analyse the distribution of RNA at time t = 200 
using our approach. The state vector is given as [P, RNA, DNA] and we start 
with the concrete state [0, 1, 0]. To sufficiently reason about the RNA population 
and to handle the very high population of the proteins, we use the following 
population discretisation: thresholds {10, 1000} for P, {10, 30} for RNA, and 
{10, 100} for DNA. As before, we use very relaxed bounds 10000, 100, and 1000 
for P, RNA, and D, respectively. Note that we ignore the population of the virus 
V as it does not affect the dynamics of the other species. This simplification 
makes the visualisation of our approach more readable and has no effect on the 
complexity of the analysis. 

Figure 5 illustrates the obtained abstract model enabling the following tran- 
sient analysis (the full model is depicted in Fig. 10 [11, Appendix]. In a few days 
the system reaches from the initial state the loop (depicted by the purple dashed 
ellipse) within the iteration i1. The loop includes states where RNA has level 1, 
DNA has level 2 and P oscillates between the levels 2 and 3. Before entering 
the loop, there is a non-negligible probability (orders of percent) that the RNA 
drops to 0 via the full black branch that returns to transient part of the loop 
in i1. In this branch the system can also die out (not shown in this figure, see 
the full model) with probability in the order of tenths of percent. 
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The average exit time of the loop in ¿Z is in the order of 10 days and the 
system goes to the yellow loop within the iteration i2, where the DNA level is 
increased to 3 (RNA level is unchanged and P again oscillates between the levels 
2 and 3). The average exit time of the loop in 72 is again in the order of 10 
days and systems goes to the dotted red loop within iteration 73. The transition 
represents the sequence of RNA synthesis that leads to RNA level 2. P oscillates 
as before. Finally, the system leaves the loop in i (this takes another dozen 
days) and reaches RNA level 3 in iterations i4 and 15 where the DNA level 
remains at the level 3 and P oscillates. The iteration i4 and 75 thus roughly 
correspond to the examined transient time t = 200. 

The analysis clearly demonstrates that our approach leads to the behaviour 
that is well aligned with the previous experiments. We observed growth of the 
RNA population with a non-negligible probability of its extinction. The concrete 
quantities (i.e. the probability of the extinction and the mean RNA population) 
are closer to the analysis in [43]. The quantities are indeed affected by the popu- 
lation discretisation and can be further refined. We would like to emphasise that 
in contrast to the methods presented in [9,23,43] requiring hours of intensive 
numerical computation, our approach can be done even manually on the paper. 
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Abstract. Statistical model checking (SMC) is a technique for analysis 
of probabilistic systems that may be (partially) unknown. We present an 
SMC algorithm for (unbounded) reachability yielding probably approx- 
imately correct (PAC) guarantees on the results. We consider both the 
setting (i) with no knowledge of the transition function (with the only 
quantity required a bound on the minimum transition probability) and 
(ii) with knowledge of the topology of the underlying graph. On the 
one hand, it is the first algorithm for stochastic games. On the other 
hand, it is the first practical algorithm even for Markov decision pro- 
cesses. Compared to previous approaches where PAC guarantees require 
running times longer than the age of universe even for systems with a 
handful of states, our algorithm often yields reasonably precise results 
within minutes, not requiring the knowledge of mixing time. 


1 Introduction 


® 


Check for 
updates 


Statistical model checking (SMC) [YS02a] is an analysis technique for prob- 
abilistic systems based on 


1. 
2. 
3. 


simulating finitely many finitely long runs of the system, 
statistical analysis of the obtained results, 

yielding a confidence interval/probably approximately correct (PAC) result 
on the probability of satisfying a given property, i.e., there is a non-zero prob- 
ability that the bounds are incorrect, but they are correct with probability 
that can be set arbitrarily close to 1. 


One of the advantages is that it can avoid the state-space explosion problem, 
albeit at the cost of weaker guarantees. Even more importantly, this technique 
is applicable even when the model is not known (black-box setting) or only 
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qualitatively known (grey-box setting), where the exact transition probabilities 
are unknown such as in many cyber-physical systems. 

In the basic setting of Markov chains [Nor98] with (time- or step-)bounded 
properties, the technique is very efficient and has been applied to numerous 
domains, e.g. biological [JCL+09,PGL+13], hybrid [ZPC10, DDL+12, EGF12, 
Lar12] or cyber-physical [BBB+10,CZ11,DDL+13] systems and a substantial 
tool support is available [JLS12, BDL+12,BCLS13, BHH12]. In contrast, when- 
ever either (i) infinite time-horizon properties, e.g. reachability, are considered or 
(ii) non-determinism is present in the system, providing any guarantees becomes 
significantly harder. 

Firstly, for infinite time-horizon properties we need a stopping criterion such 
that the infinite-horizon property can be reliably evaluated based on a finite 
prefix of the run yielded by simulation. This can rely on the the complete knowl- 
edge of the system (white-box setting) [YCZ10, LP08], the topology of the system 
(grey box) [YCZ10,HJB+10], or a lower bound pmin on the minimum transition 
probability in the system (black box) [DHKP16, BCC+14]. 

Secondly, for Markov decision processes (MDP) [Put14] with (non-trivial) 
non-determinism, [HMZ+12] and [LP12] employ reinforcement learning [SB98] 
in the setting of bounded properties or discounted (and for the purposes of 
approximation thus also bounded) properties, respectively. The latter also yields 
PAC guarantees. 

Finally, for MDP with unbounded properties, [BFHH11] deals with MDP 
with spurious non-determinism, where the way it is resolved does not affect 
the desired property. The general non-deterministic case is treated in [FT14, 
BCC+14], yielding PAC guarantees. However, the former requires the knowledge 
of mixing time, which is at least as hard to compute; the algorithm in the latter 
is purely theoretical since before a single value is updated in the learning process, 
one has to simulate longer than the age of universe even for a system as simple 
as a Markov chain with 12 states having at least 4 successors for some state. 


Our contribution is an SMC algorithm with PAC guarantees for (i) MDP and 
unbounded properties, which runs for realistic benchmarks [HKP+19] and con- 
fidence intervals in orders of minutes, and (ii) is the first algorithm for stochastic 
games (SG). It relies on different techniques from literature. 


1. The increased practical performance rests on two pillars: 

— extending early detection of bottom strongly connected components in 
Markov chains by [DHKP16] to end components for MDP and simple 
end components for SG; 

— improving the underlying PAC Q-learning technique of [SLW +06]: 

(a) learning is now model-based with better information reuse instead of 
model-free, but in realistic settings with the same memory require- 
ments, 

(b) better guidance of learning due to interleaving with precise computa- 
tion, which yields more precise value estimates. 

(c) splitting confidence over all relevant transitions, allowing for variable 
width of confidence intervals on the learnt transition probabilities. 
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2. The transition from algorithms for MDP to SG is possible via extend- 
ing the over-approximating value iteration from MDP [BCC+14] to SG by 
[KKKW18}]. 


To summarize, we give an anytime PAC SMC algorithm for (unbounded) reach- 
ability. It is the first such algorithm for SG and the first practical one for MDP. 


Related Work 


Most of the previous efforts in SMC have focused on the analysis of properties 
with bounded horizon [YS02a,SVA04, YKNP06, JCL+09, JLS12, BDL+12]. 

SMC of unbounded properties was first considered in [HLMP04] and the 
first approach was proposed in [SVA05], but observed incorrect in [HJB+10]. 
Notably, in [YCZ10] two approaches are described. The first approach proposes 
to terminate sampled paths at every step with some probability Pterm and re- 
weight the result accordingly. In order to guarantee the asymptotic convergence 
of this method, the second eigenvalue of the chain and its mixing time must 
be computed, which is as hard as the verification problem itself and requires the 
complete knowledge of the system (white box setting). The correctness of [LP08] 
relies on the knowledge of the second eigenvalue A, too. The second approach 
of [YCZ10] requires the knowledge of the chain’s topology (grey box), which is 
used to transform the chain so that all potentially infinite paths are eliminated. 
In [HJB+10], a similar transformation is performed, again requiring knowledge 
of the topology. In [DHKP16], only (a lower bound on) the minimum transition 
probability pmin is assumed and PAC guarantees are derived. While unbounded 
properties cannot be analyzed without any information on the system, knowledge 
of Pmin is a relatively light assumption in many realistic scenarios [DHKP 16]. For 
instance, bounds on the rates for reaction kinetics in chemical reaction systems 
are typically known; for models in the PRISM language [KNP11], the bounds 
can be easily inferred without constructing the respective state space. In this 
paper, we thus adopt this assumption. 

In the case with general non-determinism, one approach is to give the non- 
determinism a probabilistic semantics, e.g., using a uniform distribution instead, 
as for timed automata in [DLL+11a, DLL+11b,Lar13]. Others [LP 12, HMZ+12, 
BCC+14] aim to quantify over all strategies and produce an e-optimal strategy. 
In [HMZ-+12], candidates for optimal strategies are generated and gradually 
improved, but “at any given point we cannot quantify how close to optimal 
the candidate scheduler is” (cited from [HMZ+12]) and the algorithm “does 
not in general converge to the true optimum” (cited from [LST14]). Further, 
[LST 14, DLST15,DHS18] randomly sample compact representation of strategies, 
resulting in useful lower bounds if e-schedulers are frequent. [HPS+19] gives 
a convergent model-free algorithm (with no bounds on the current error) and 
identifies that the previous [SKC+14] “has two faults, the second of which also 
affects approaches [...] [HAK18, HAK19]”. 

Several approaches provide SMC for MDPs and unbounded properties with 
PAC guarantees. Firstly, similarly to [LP08,YCZ10], [FT14] requires (1) the 
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mixing time T of the MDP. The algorithm then yields PAC bounds in time 
polynomial in T (which in turn can of course be exponential in the size of the 
MDP). Moreover, the algorithm requires (2) the ability to restart simulations 
also in non-initial states, (3) it only returns the strategy once all states have 
been visited (sufficiently many times), and thus (4) requires the size of the state 
space |S|. Secondly, [BCC+14], based on delayed Q-learning (DQL) [SLW-+06], 
lifts the assumptions (2) and (3) and instead of (1) mixing time requires only (a 
bound on) the minimum transition probability payin. Our approach additionally 
lifts the assumption (4) and allows for running times faster than those given by 
T, even without the knowledge of T. 

Reinforcement learning (without PAC bounds) for stochastic games has been 
considered already in [LN81,Lit94,BT99]. [WT16] combines the special case of 
almost-sure satisfaction of a specification with optimizing quantitative objec- 
tives. We use techniques of [KKKW18], which however assumes access to the 
transition probabilities. 


2 Preliminaries 


2.1 Stochastic Games 


A probability distribution on a finite set X is a mapping ô : X — [0,1], such 
that >` ex 5(x) = 1. The set of all probability distributions on X is denoted 
by D(X). Now we define turn-based two-player stochastic games. As opposed to 
the notation of e.g. [Con92], we do not have special stochastic nodes, but rather 
a probabilistic transition function. 


Definition 1 (SG). A stochastic game (SG) is a tuple 
G = (S, Sp, So; So, A, Av, T), where S is a finite set of states partitioned! into the 
sets Sy and So of states of the player Maximizer and Minimizer”, respectively 


Sp € S is the initial state, A is a finite set of actions, Av : S > 24 assigns to every 
state a set of available actions, and T : S x A > D(S) is a transition function 
that given a state s and an action a € Av(s) yields a probability distribution 
over successor states. Note that for ease of notation we write T(s,a,t) instead 
of T(s, a)(t). 


A Markov decision process (MDP) is a special case of SG where So = 0. A 
Markov chain (MC) can be seen as a special case of an MDP, where for all 
s € S : |Av(s)| = 1. We assume that SG are non-blocking, so for all states s we 
have Av(s) 4 0. 

For a state s and an available action a € Av(s), we denote the set of successors 
by Post(s,a) := {t | T(s,a,t) > 0}. We say a state-action pair (s,a) is an erit 
of a set of states T, written (s,a)exitsT, if St € Post(s,a):t ¢ T, i.e., if with 
some probability a successor outside of T could be chosen. 

We consider algorithms that have a limited information about the SG. 


‘The, S C S, So ES, Sg U Sọ = S, and S NSo =l. 
? The names are chosen, because Maximizer maximizes the probability of reaching a 
given target state, and Minimizer minimizes it. 
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Definition 2 (Black box and grey box). An algorithm inputs an SG as 
black box if it cannot access the whole tuple, but 


— it knows the initial state, 
— for a given state, an oracle returns its player and available action, 
— given a state s and action a, it can sample a successor t according to T(s,a), 
— it knows pmin < Minses acav(s) T(S,a,t), an under-approximation of the min- 
t€Post(s,a) 
imum transition probability. 


3 


When input as grey box it additionally knows the number |Post(s,a)| of succes- 
sors for each state s and action a.t 


The semantics of SG is given in the usual way by means of strategies and the 
induced Markov chain [BK08] and its respective probability space, as follows. 
An infinite path p is an infinite sequence p = Sgap$,a, +- E (S x A)”, such that 
for every i € N, a; € Av(s;) and s;,, € Post(s;,a;). 

A strategy of Maximizer or Minimizer is a function ø : Sg > D(A) or So > 
D(A), respectively, such that o(s) € D(Av(s)) for all s. Note that we restrict to 
memoryless/positional strategies, as they suffice for reachability in SGs [CH12]. 

A pair (0,7) of strategies of Maximizer and Minimizer induces a Markov 
chain G%7 with states S, sọ being initial, and the transition function T(s)(t) = 
VacAv(s) 7(S)(a) : T(s, a, t) for states of Maximizer and analogously for states of 
Minimizer, with ø replaced by r. The Markov chain induces a unique probability 
distribution P®:7 over measurable sets of infinite paths [BK08, Ch. 10]. 


2.2 Reachability Objective 


For a goal set Goal C S, we write OGoal := {sgags,a,--- | Ji € N : s; € Goal} 
to denote the (measurable) set of all infinite paths which eventually reach Goal. 
For each s € S, we define the value in s as 


V(s) := sup inf PI” (Goal) = inf sup PI” (Goal), 


where the equality follows from [Mar75]. We are interested in V(sọ), its 
€-approximation and the corresponding (¢-)optimal strategies for both players. 


3 Up to this point, this definition conforms to black box systems in the sense of [SVA04] 
with sampling from the initial state, being slightly stricter than [YS02a] or [RP09], 
where simulations can be run from any desired state. Further, we assume that we 
can choose actions for the adversarial player or that she plays fairly. Otherwise the 
adversary could avoid playing her best strategy during the SMC, not giving SMC 
enough information about her possible behaviours. 

4 This requirement is slightly weaker than the knowledge of the whole topology, i.e. 
Post(s,a) for each s and a. 
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Let Zero be the set of states, from which there is no finite path to any state 
in Goal. The value function V satisfies the following system of equations, which 
is referred to as the Bellman equations: 


maXacav(s) V(s,a) ifs eS 


minacav(s) V(s,a) ifs € So 
1 if s € Goal 
0 if s € Zero 


V(s) = 


with the abbreviation V(s,a) := X yes T(s,a,s’)-V(s’). Moreover, V is the least 
solution to the Bellman equations, see e.g. [CH08). 


2.3 Bounded and Asynchronous Value Iteration 


The well known technique of value iteration, e.g. [Put14, RF91], works by starting 
from an under-approximation of value function and then applying the Bellman 
equations. This converges towards the least fixpoint of the Bellman equations, 
i.e. the value function. Since it is difficult to give a convergence criterion, the 
approach of bounded value iteration (BVI, also called interval iteration) was 
developed for MDP [BCC+14,HM17] and SG [KKKW18]. Beside the under- 
approximation, it also updates an over-approximation according to the Bellman 
equations. The most conservative over-approximation is to use an upper bound 
of 1 for every state. For the under-approximation, we can set the lower bound 
of target states to 1; all other states have a lower bound of 0. We use the func- 
tion INITIALIZE_-BOUNDS in our algorithms to denote that the lower and upper 
bounds are set as just described; see [AKW19, Algorithm 8] for the pseudocode. 
Additionally, BVI ensures that the over-approximation converges to the least 
fixpoint by taking special care of end components, which are the reason for not 
converging to the true value from above. 


Definition 3 (End component (EC)). A non-empty set T C S of states is 
an end component (EC) if there is a non-empty set B C User Av(s) of actions 
such that (i) for eachs € T,a € BO Av(s) we do not have (s,a) exitsT and (ii) 
for each s,s’ € T there is a finite path w = say...a,s' € (T x B)* x T, ie. the 
path stays inside T and only uses actions in B. 


Intuitively, ECs correspond to bottom strongly connected components of the 
Markov chains induced by possible strategies, so for some pair of strategies all 
possible paths starting in the EC remain there. An end component T is a maximal 
end component (MEC) if there is no other end component T” such that T C T. 
Given an SG G, the set of its MECs is denoted by MEC(G). 

Note that, to stay in an EC in an SG, the two players would have to cooperate, 
since it depends on the pair of strategies. To take into account the adversarial 
behaviour of the players, it is also relevant to look at a subclass of ECs, the so 
called simple end components, introduced in [KKKW18]. 
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Definition 4 (Simple end component (SEC) [KKKW18]). An EC T is 
called simple, if for alls € T it holds that V(s) = bestExit(T,V), where 


1 if T A Goal 4 0 


maxX seTnS, f(s,a) else 
(s,a) exits T 


bestExit(T, f) := 


is called the best exit (of Maximizer) from T according to the function f : S > R. 
To handle the case that there is no exit of Maximizer in T we set maxg = 0. 


Intuitively, SECs are ECs where Minimizer does not want to use any of 
her exits, as all of them have a greater value than the best exit of Maximizer. 
Assigning any value between those of the best exits of Maximizer and Minimizer 
to all states in the EC is a solution to the Bellman equations, because both 
players prefer remaining and getting that value to using their exits [KKKW18, 
Lemma 1]. However, this is suboptimal for Maximizer, as the goal is not reached 
if the game remains in the EC forever. Hence we “deflate” the upper bounds 
of SECs, i.e. reduce them to depend on the best exit of Maximizer. T is called 
maximal simple end component (MSEC), if there is no SEC T” such that T ¢ T. 
Note that in MDPs, treating all MSECs amounts to treating all MECs. 


Algorithm 1. Bounded value iteration algorithm for SG (and MDP) 
1: procedure BVI(SG G, target set Goal, precision € > 0) 


2: INITIALIZE -BOUNDS 

3: repeat 

4: X — SIMULATE until LOOPING or state in Goal is hit 

5: UPDATE(X) > Bellman updates or their modification 
6: for T € FIND-MSECs(X) do 

7 DEFLATE(T) > Decrease the upper bound of MSECs 
8 until U (sọ) — L (so) < € 


Algorithm 1 rephrases that of [KKKW18] and describes the general structure 
of all bounded value iteration algorithms that are relevant for this paper. We 
discuss it here since all our improvements refer to functions (in capitalized font) 
in it. In the next section, we design new functions, pinpointing the difference 
to the other papers. The pseudocode of the functions adapted from the other 
papers can be found, for the reader’s convenience, in [AKW19, Appendix A]. 
Note that to improve readability, we omit the parameters G, Goal, L and U of 
the functions in the algorithm. 


Bounded Value Iteration: For the standard bounded value iteration algo- 
rithm, Line 4 does not run a simulation, but just assigns the whole state 
space S to X*. Then it updates all values according to the Bellman equations. 


5 Since we mainly talk about simulation based algorithms, we included this line to 
make their structure clearer. 
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After that it finds all the problematic components, the MSECs, and “deflates” 
them as described in [KKKW18], i.e. it reduces their values to ensure the con- 
vergence to the least fixpoint. This suffices for the bounds to converge and the 
algorithm to terminate [KKKW18, Theorem 2]. 


Asynchronous Bounded Value Iteration: To tackle the state space explo- 
sion problem, asynchronous simulation/learning-based algorithms have been 
developed [MLG05, BCC+14, KKKW18]. The idea is not to update and deflate 
all states at once, since there might be too many, or since we only have limited 
information. Instead of considering the whole state space, a path through the 
SG is sampled by picking in every state one of the actions that look optimal 
according to the current over-/under-approximation and then sampling a suc- 
cessor of that action. This is repeated until either a target is found, or until the 
simulation is looping in an EC; the latter case occurs if the heuristic that picks 
the actions generates a pair of strategies under which both players only pick 
staying actions in an EC. After the simulation, only the bounds of the states on 
the path are updated and deflated. Since we pick actions which look optimal in 
the simulation, we almost surely find an e-optimal strategy and the algorithm 
terminates [BCC+14, Theorem 3]. 


3 Algorithm 


3.1 Model-Based 


Given only limited information, updating cannot be done using T, since the true 
probabilities are not known. The approach of [BCC+14] is to sample for a high 
number of steps and accumulate the observed lower and upper bounds on the 
true value function for each state-action pair. When the number of samples is 
large enough, the average of the accumulator is used as the new estimate for 
the state-action pair, and thus the approximations can be improved and the 
results back-propagated, while giving statistical guarantees that each update 
was correct. However, this approach has several drawbacks, the biggest of which 
is that the number of steps before an update can occur is infeasibly large, often 
larger than the age of the universe, see Table 1 in Sect. 4. 

Our improvements to make the algorithm practically usable are linked to 
constructing a partial model of the given system. That way, we have more infor- 
mation available on which we can base our estimates, and we can be less conser- 
vative when giving bounds on the possible errors. The shift from model-free to 
model-based learning asymptotically increases the memory requirements from 
O(|S|- |A]) (as in [SLW+06,BCC+14]) to O(|S|? - |A]). However, for systems 
where each action has a small constant bound on the number of successors, 
which is typical for many practical systems, e.g. classical PRISM benchmarks, 
it is still O(|S|-|A]) with a negligible constant difference. 

We thus track the number of times some successor t has been observed when 
playing action a from state s in a variable #(s,a,t). This implicitly induces 
the number of times each state-action pair (s,a) has been played #(s,a) = 
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res #(S, a, t). Given these numbers we can then calculate probability estimates 
for every transition as described in the next subsection. They also induce the 
set of all states visited so far, allowing us to construct a partial model of the 
game. See [AKW19, Appendix A.2] for the pseudo-code of how to count the 
occurrences during the simulations. 


3.2 Safe Updates with Confidence Intervals Using Distributed 
Error Probability 


We use the counters to compute a lower estimate of the transition probability 
for some error tolerance ôr as follows: We view sampling t from state-action pair 
(s,a) as a Bernoulli sequence, with success probability T(s,a,t), the number of 
trials #4(s,a) and the number of successes #(s,a,t). The tightest lower estimate 
we can give using the Hoeffding bound (see [AKW19, Appendix D.1]) is 


m #(s,a, t) 


T(s,a,t) := max(0, a c), (1) 


where the confidence width c := ae Since c could be greater than 1, 
we limit the lower estimate to be at least 0. Now we can give modified update 


equations: 


L(s,a) : XO T(s,a,t) L(t) 


t:#(s,a,t)>0 


U(s,a) := X Tsat uw] +{1- X To,a,t) 


t:#(s,a,t)>0 t:#(s,a,t)>0 


The idea is the same for both upper and lower bound: In contrast to the usual 
Bellman equation (see Sect. 2.2) we use T instead of T. But since the sum of all 
the lower estimates does not add up to one, there is some remaining probability 
for which we need to under-/over-approximate the value it can achieve. We use 


Poz 
Pi 
b2 


p2 


> t o 


a1 
~ 
$ by 


a2 y: Sa G € 


Fig. 1. A running example of an SG. The dashed part is only relevant for the later 
examples. For actions with only one successor, we do not depict the transition proba- 
bility 1 (e.g. T (so, a1,s1)). For state-action pair (si, bz), the transition probabilities are 
parameterized and instantiated in the examples where they are used. 
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the safe approximations 0 and 1 for the lower and upper bound respectively; this 
is why in L there is no second term and in U the whole remaining probability 
is added. Algorithm 2 shows the modified update that uses the lower estimates; 
the proof of its correctness is in [AKW19, Appendix D.2). 


Lemma 1 (UPDATE is correct). Given correct under- and over-approzi- 
mations L,U of the value function V, and correct lower probability estimates 
T, the under- and over-approzimations after an application of UPDATE are also 
correct. 


Algorithm 2. New update procedure using the probability estimates 
1: procedure UPDATE(State set X) 
2: for f € {L,U} do > For both functions 
3: for s € X \ Goal do > For all non-target states in the given set 
MaXa cAv(s) f (s,a) ifsesS 
f(s) = 


mins €cAv(s) f (s,a) if s € So 


4: 


Example 1. We illustrate how the calculation works and its huge advantage over 
the approach from [BCC+14] on the SG from Fig. 1. For this example, ignore 
the dashed part and let py = p2 = 0.5, i.e. we have no self loop, and an even 
chance to go to the target 1 or a sink o. Observe that hence V (so) = V(s1) = 0.5. 

Given an error tolerance of 6 = 0.1, the algorithm of [BCC+14] would have 
to sample for more than 10° steps before it could attempt a single update. In 
contrast, assume we have seen 5 samples of action bz, where 1 of them went to 1 
and 4 of them to o. Note that, in a sense, we were unlucky here, as the observed 
averages are very different from the actual distribution. The confidence width for 
ôr = 0.1 and 5 samples is \/In(0.1)/— 2-5 ~ 0.48. So given that data, we get 
T(s1,b2,1) = max(0,0.2—0.48) = 0 and T(s, bo, 0) = max(0, 0.8 — 0.48) = 0.32. 
Note that both probabilities are in fact lower estimates for their true counterpart. 

Assume we already found out that o is a sink with value 0; how we gain this 
knowledge is explained in the following subsections. Then, after getting only 
these 5 samples, UPDATE already decreases the upper bound of (s1, b2) to 0.68, 
as we know that at least 0.32 of T(si,b2) goes to the sink. 

Given 500 samples of action b2, the confidence width of the probability esti- 
mates already has decreased below 0.05. Then, since we have this confidence 
width for both the upper and the lower bound, we can decrease the total preci- 
sion for (s1, b2) to 0.1, i.e. return an interval in the order of [0.45; 0.55]. < 


Summing up: with the model-based approach we can already start updating after 
very few steps and get a reasonable level of confidence with a realistic number 
of samples. In contrast, the state-of-the-art approach of [BCC+14] needs a very 
large number of samples even for this toy example. 

Since for UPDATE we need an error tolerance for every transition, we need 
to distribute the given total error tolerance ô over all transitions in the current 
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partial model. For all states in the explored partial model S we know the number 


of available actions and can over-approximate the number of successors as 7 E 


Thus the error tolerance for each transition can be set to ôr := Ö'Pmin 


~ [{alseSAacAv(s)}] * 
This is illustrated in Example 4 in [AKW19, Appendix B]. 
Note that the fact that the error tolerance ôr for every transition is the same 
does not imply that the confidence width for every transition is the same, as the 
latter becomes smaller with increasing number of samples #(s, a). 


3.3 Improved EC Detection 


As mentioned in the description of Algorithm 1, we must detect when the simu- 
lation is stuck in a bottom EC and looping forever. However, we may also stop 
simulations that are looping in some EC but still have a possibility to leave it; 
for a discussion of different heuristics from [BCC+14,KKKW18], see [AKW19, 
Appendix A.3]. 

We choose to define LOOPING as follows: Given a candidate for a bottom EC, 
we continue sampling until we are dr-sure (i.e. the error probability is smaller 
than ôr) that we cannot leave it. Then we can safely deflate the EC, i.e. decrease 
all upper bounds to zero. 

To detect that something is a dp-sure EC, we do not sample for the astronom- 
ical number of steps as in [BCC+14], but rather extend the approach to detect 
bottom strongly connected components from [DHKP16]. If in the EC-candidate 
T there was some state-action pair (s,a) that actually has a probability to exit 
the T, that probability is at least pin. So after sampling (s,a) for n times, the 
probability to overlook such a leaving transition is (1 — pmin)” and it should be 
smaller than ôr. Solving the inequation for the required number of samples n 
yields n > Wes 

Algorithm 3 checks that we have seen all staying state-action pairs n times, 
and hence that we are ôr-sure that T is an EC. Note that we restrict to staying 
state-action pairs, since the requirement for an EC is only that there exist staying 
actions, not that all actions stay. We further speed up the EC-detection, because 
we do not wait for n samples in every simulation, but we use the aggregated 
counters that are kept over all simulations. 


Algorithm 3. Check whether we are ôr-sure that T is an EC 
1: procedure 67-sure EC (State set T) 


2: requiredSamples = mes 
3: B — {(s,a) |s € T A-(s,a) exits T} > Set of staying state-action pairs 


4: return Ne a)€B #(s,a) > requiredSamples 


We stop a simulation, if LOOPING returns true, i.e. under the following three 
conditions: (i) We have seen the current state before in this simulation (s € X), 
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i.e. there is a cycle. (ii) This cycle is explainable by an EC T in our current 
partial model. (iii) We are ôr-sure that T is an EC. 


Algorithm 4. Check if we are probably looping and should stop the simulation 
1: procedure LOOPING(State set X, state s) 

2: ifs ¢ X then 

3: return false > Easy improvement to avoid overhead 
4: return JT C X.T is EC in partial model As € T A ôr-sure EC(T) 


Example 2. For this example, we again use the SG from Fig.1 without the 
dashed part, but this time with py = p2 = p3 = z. Assume the path we simulated 
is (So, a1, S1, b2, S1), ie. we sampled the self-loop of action bz. Then {s1} is a can- 
didate for an EC, because given our current observation it seems possible that 
we will continue looping there forever. However, we do not stop the simulation 
here, because we are not yet ôðr-sure about this. Given ôr = 0.1, the required 
samples for that are 6, since met) = 5.6. With high probability (greater than 
(1 — dr) = 0.9), within these 6 steps we will sample one of the other successors 
of (s1, b2) and thus realise that we should not stop the simulation in sı. If, on 
the other hand, we are in state o or if in state sı the guiding heuristic only picks 
bı, then we are in fact looping for more than 6 steps, and hence we stop the 
simulation. < 


3.4 Adapting to Games: Deflating MSECs 


To extend the algorithm of [BCC+14] to SGs, instead of collapsing problematic 
ECs we deflate them as in [KKKW18], i.e. given an MSEC, we reduce the upper 
bound of all states in it to the upper bound of the bestExit of Maximizer. In 
contrast to [KKKW18], we cannot use the upper bound of the bestExit based on 
the true probability, but only based on our estimates. Algorithm 5 shows how to 
deflate an MSEC and highlights the difference, namely that we use U instead 
of U. 


Algorithm 5. Black box algorithm to deflate a set of states 


1: procedure DEFLATE(State set X) 
2: for s € X do 


3: U(s) = min(U (s), bestExit(X, U ) 


The remaining question is how to find MSECs. The approach of [KKKW18] 
is to find MSECs by removing the suboptimal actions of Minimizer according 
to the current lower bound. Since it converges to the true value function, all 
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MSECs are eventually found [KKKW18, Lemma 2]. Since Algorithm 6 can only 
access the SG as a black box, there are two differences: We can only compare our 
estimates of the lower bound L(s,a) to find out which actions are suboptimal. 
Additionally there is the problem that we might overlook an exit from an EC, 
and hence deflate to some value that is too small; thus we need to check that any 
state set FIND-MSECs returns is a é¢-sure EC. This is illustrated in Example 3. 
For a bigger example of how all our functions work together, see Example 5 in 
[AKW19, Appendix B]. 


Algorithm 6. Finding MSECs in the game restricted to X for black box setting 
1: procedure FIND_MSECs(State set X) 


2: suboptActy + {(s, {a € Av(s) | L (s,a) > L(s)} |s € SQN X} 


3 Av’ — Av without suboptAct, 
4: G’ — G restricted to states X and available actions Av’ 
5 return {T € MEC(G’) | dp-sure EC(T) } 


Example 3. For this example, we use the full SG from Fig.1, including the 
dashed part, with pi,p2 > 0. Let (so, a1, 51, be, $2, b1,$1, a2,52,C,1) be the path 
generated by our simulation. Then in our partial view of the model, it seems 
as if T = {so,s,} is an MSEC, since using ag is suboptimal for the minimizing 
state so® and according to our current knowledge a1, bı and bg all stay inside T. 
If we deflated T now, all states would get an upper bound of 0, which would be 
incorrect. 

Thus in Algorithm 6 we need to require that T is an EC ôr-surely. This was 
not satisfied in the example, as the state-action pairs have not been observed the 
required number of times. Thus we do not deflate T, and our upper bounds stay 
correct. Having seen (s1, b2) the required number of times, we probably know 
that it is exiting T and hence will not make the mistake. < 


3.5 Guidance and Statistical Guarantee 


It is difficult to give statistical guarantees for the algorithm we have developed 
so far (i.e. Algorithm 1 calling the new functions from Sects. 3.2, 3.3 and 3.4). 
Although we can bound the error of each function, applying them repeatedly can 
add up the error. Algorithm 7 shows our approach to get statistical guarantees: 
It interleaves a guided simulation phase (Lines 7-10) with a guaranteed standard 
bounded value iteration (called BVI phase) that uses our new functions (Lines 
11-16). 

The simulation phase builds the partial model by exploring states and remem- 
bering the counters. In the first iteration of the main loop, it chooses actions 
randomly. In all further iterations, it is guided by the bounds that the last BVI 


6 For dr = 0.2, sampling the path to target once suffices to realize that L(so, a2) > 0. 
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phase computed. After Aj, simulations (see below for a discussion of how to 
choose Mg), all the gathered information is used to compute one version of the 
partial model with probability estimates T for a certain error tolerance ôk. We 
can continue with the assumption, that these probability estimates are correct, 
since it is only violated with a probability smaller than our error tolerance (see 
below for an explanation of the choice of 6,). So in our correct partial model, 
we re-initialize the lower and upper bound (Line 12), and execute a guaran- 
teed standard BVI. If the simulation phase already gathered enough data, i.e. 
explored the relevant states and sampled the relevant transitions often enough, 
this BVI achieves a precision smaller than e€ in the initial state, and the algo- 
rithm terminates. Otherwise we start another simulation phase that is guided 
by the improved bounds. 


Algorithm 7. Full algorithm for black box setting 


1: procedure BLACKVI(SG G, target set Goal, precision £ > 0, error tolerance ô > 0) 
2: INITIALIZE-BOUNDS 
: k=1 > guaranteed BVI counter 


Sp > current partial model 


k—2.k 


bp — 2 


3 
4 
5: repeat 
6: 
T k 


// Guided simulation phase 


8: for Nk times do 
9: X — SIMULATE 
10: S=-SUX 
// Guaranteed BVI phase 

g Ôk'Pmin ; A : 
11: ôT ETNON > Set ôr as described in Section 3.2 
12: INITIALIZE_BOUNDS 
13: for k- g times do 
14: UPDATE(S) 
15: for T € FIND_MSECs(S) do 
16: DEFLATE(T) 


17: until U (sọ) — L(s9) < € 


Choice of 6,: For each of the full BVI phases, we construct a partial model 
that is correct with probability (1 — ôx). To ensure that the sum of these errors 
is not larger than the specified error tolerance 6, we use the variable k, which is 


initialised to 1 and doubled in every iteration of the main loop. Hence for the 
i-th BVI, k = 2’. By setting 6, = $, we get that >, ôk = >, at 6, and hence 
the error of all BVI phases does not exceed the specified error tolerance. 
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When to Stop Each BVI-Phase: The BVI phase might not converge if the 
probability estimates are not good enough. We increase the number of iterations 
for each BVI depending on k, because that way we ensure that it eventually 
is allowed to run long enough to converge. On the other hand, since we always 
run for finitely many iterations, we also ensure that, if we do not have enough 
information yet, BVI is eventually stopped. Other stopping criteria could return 


arbitrarily imprecise results [HM17]. We also multiply with S| to improve the 


chances of the early BVIs to converge, as that number of iterations ensures that 
every value has been propagated through the whole model at least once. 


Discussion of the Choice of Mk: The number of simulations between the 
guaranteed BVI phases can be chosen freely; it can be a constant number every 
S|, 
€ or any of the parameters of G. The design of particularly efficient choices or 
learning mechanisms that adjust them on the fly is an interesting task left for 
future work. We conjecture the answer depends on the given SG and “task” that 
the user has for the algorithm: E.g. if one just needs a quick general estimate of 
the behaviour of the model, a smaller choice of Nz is sensible; if on the other 
hand a definite precision £ certainly needs to be achieved, a larger choice of Mk 
is required. 


time, or any sequence of natural numbers, possibly parameterised by e.g. k, 


Theorem 1. For any choice of sequence for Ng, Algorithm 7 is an anytime 
algorithm with the following property: When it is stopped, it returns an interval 
for V(so) that is PAC” for the given error tolerance 5 and some <’, with 0 < 
e <i. 


Theorem 1 is the foundation of the practical usability of our algorithm. Given 
some time frame and some Nj, it calculates an approximation for V (sọ) that is 
probably correct. Note that the precision e’ is independent of the input parameter 
€, and could in the worst case be always 1. However, practically it often is 
good (i.e. close to 0) as seen in the results in Sect. 4. Moreover, in our modified 
algorithm, we can also give a convergence guarantee as in [BCC+14]. Although 
mostly out of theoretical interest, in [AKW19, Appendix D.4] we design such a 
sequence Mk, too. Since this a-priori sequence has to work in the worst case, it 
depends on an infeasibly large number of simulations. 


Theorem 2. There exists a choice of Ng, such that Algorithm 7 is PAC for any 
input parameters ¢, 6, i.e. it terminates almost surely and returns an interval for 
V(so) of width smaller than £ that is correct with probability at least 1 — ô. 


T Probably Approximately Correct, i.e. with probability greater than 1 — 6, the value 
lies in the returned interval of width z’. 
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3.6 Utilizing the Additional Information of Grey Box Input 


In this section, we consider the grey box setting, i.e. for every state-action pair 
(s,a) we additionally know the exact number of successors |Post(s,a)|. Then 
we can sample every state-action pair until we have seen all successors, and 
hence this information amounts to having qualitative information about the 
transitions, i.e. knowing where the transitions go, but not with which probability. 

In that setting, we can improve the EC-detection and the estimated bounds in 
UPDATE. For EC-detection, note that the whole point of d7-sure EC is to check 
whether there are further transitions available; in grey box, we know this and 
need not depend on statistics. For the bounds, note that the equations for L and 
U both have two parts: The usual Bellman part and the remaining probability 
multiplied with the most conservative guess of the bound, i.e. 0 and 1. If we 
know all successors of a state-action pair, we do not have to be as conservative; 
then we can use mintepost(sa) L(t) respectively maxtepost(s,a) U(t). Both these 
improvements have huge impact, as demonstrated in Sect. 4. However, of course, 
they also assume more knowledge about the model. 


4 Experimental Evaluation 


We implemented the approach as an extension of PRISM-Games [CFK+13a]. 11 
MDPs with reachability properties were selected from the Quantitative Verifi- 
cation Benchmark Set [HKP+19]. Further, 4 stochastic games benchmarks from 
[CKJ12,8$12,CFK+13b,CKPS11] were also selected. We ran the experiments 
on a 40 core Intel Xeon server running at 2.20 GHz per core and having 252 GB 
of RAM. The tool however utilised only a single core and 1GB of memory for 
the model checking. Each benchmark was ran 10 times with a timeout of 30 min. 
We ran two versions of Algorithm 7, one with the SG as a black box, the other 
as a grey box (see Definition 2). We chose Mp = 10,000 for all iterations. The 
tool stopped either when a precision of 1078 was obtained or after 30min. In 
total, 16 different model-property combinations were tried out. The results of 
the experiment are reported in Table 1. 

In the black box setting, we obtained € < 0.1 on 6 of the benchmarks. 5 
benchmarks were ‘hard’ and the algorithm did not improve the precision below 
1. For 4 of them, it did not even finish the first simulation phase. If we decrease 
Npk, the BVI phase is entered, but still no progress is made. 

In the grey box setting, on 14 of 16 benchmarks, it took only 6 min to achieve 
E€ < 0.1. For 8 these, the exact value was found within that time. Less than 
50% of the state space was explored in the case of pacman, pneuli-zuck-3, 
rabin-3, zeroconf and cloud_5. A precision of € < 0.01 was achieved on 15/16 
benchmarks over a period of 30 min. 


PAC Statistical Model Checking 513 
Table 1. Achieved precision e’ given by our algorithm in both grey and black box 
settings after running for a period of 30min (See the paragraph below Theorem 1 for 
why we use z’ and not £). The first set of the models are MDPs and the second set are 
SGs. ‘-’ indicates that the algorithm did not finish the first simulation phase and hence 
partial BVI was not called. m is the number of steps required by the DQL algorithm 
of [BCC+14] before the first update. As this number is very large, we report only 
logio(m). For comparison, note that the age of the universe is approximately 1076 ns; 
logarithm of number of steps doable in this time is thus in the order of 26. 


Model States Explored % | Precision logio(m) 
Grey/Black | Grey | Black 
consensus 272 100/100 0.00945 | 0.171 | 338 
csma-2-2 1,038 93/93 0.00127 | 0.2851 | 1,888 
firewire 83,153 | 55/- 0.0057 |1 129,430 
ij-3 7 100/100 0 0.0017 | 2,675 
ij-10 1,023 | 100/100 0 0.5407 | 17 
pacman 498 18/47 0.00058 | 0.0086 | 1,801 
philosophers-3 | 956 56/21 0 1 2,068 
pnueli-zuck-3 |2,701 |25/71 0 0.0285 | 5,844 
rabin-3 27,766 7/4 0 0.026 110,097 
wlan-0 2,954 | 100/100 0 0.8667 | 9,947 
zeroconf 670 29/27 0.00007 | 0.0586 | 5,998 
cdmsn 1,240 | 100/98 0 0.8588 | 3,807 
cloud-5 8,842 49/20 0.00031 | 0.0487 71,484 
mdsm-1 62,245 69/- 0.09625 | 1 182,517 
mdsm-2 62,245 72/- 0.00055 | 1 182,517 
team-form-3 | 12,476 64/- 0 1 54,095 


Figure 2 shows the evolution of the lower and upper bounds in both the grey- 
and the black box settings for 4 different models. Graphs for the other models 
as well as more details on the results are in [AKW19, Appendix C]. 
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Fig. 2. Performance of our algorithm on various MDP and SG benchmarks in grey and 
black box settings. Solid lines denote the bounds in the grey box setting while dashed 
lines denote the bounds in the black box setting. The plotted bounds are obtained after 
each partial BVI phase, because of which they do not start at [0,1] and not at time 0. 
Graphs of the remaining benchmarks may be found in [AKW19, Appendix C]. 


5 Conclusion 


We presented a PAC SMC algorithm for SG (and MDP) with the reachability 
objective. It is the first one for SG and the first practically applicable one. 
Nevertheless, there are several possible directions for further improvements. 
For instance, one can consider different sequences for lengths of the simula- 
tion phases, possibly also dependent on the behaviour observed so far. Further, 
the error tolerance could be distributed in a non-uniform way, allowing for fewer 
visits in rarely visited parts of end components. Since many systems are strongly 
connected, but at the same time feature some infrequent behaviour, this is the 
next bottleneck to be attacked. [KM19] 
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Abstract. Monitoring consists in deciding whether a log meets a given 
specification. In this work, we propose an automata-based formalism to 
monitor logs in the form of actions associated with time stamps and 
arbitrarily data values over infinite domains. Our formalism uses both 
timing parameters and data parameters, and is able to output answers 
symbolic in these parameters and in the log segments where the prop- 
erty is satisfied or violated. We implemented our approach in an ad-hoc 
prototype SYMON, and experiments show that its high expressive power 
still allows for efficient online monitoring. 


1 Introduction 


Monitoring consists in checking whether a sequence of data (a log or a signal) 
satisfies or violates a specification expressed using some formalism. Offline mon- 
itoring consists in performing this analysis after the system execution, as the 
technique has access to the entire log in order to decide whether the specifi- 
cation is violated. In contrast, online monitoring can make a decision earlier, 
ideally as soon as a witness of the violation of the specification is encountered. 

Using existing formalisms (e.g., the metric first order temporal logic [14]), 
one can check whether a given bank customer withdraws more than 1,000 € 
every week. With formalisms extended with data, one may even identify such 
customers. Or, using an extension of the signal temporal logic (STL) [18], one can 
ask: “is that true that the value of variable x is always copied to y exactly 4 time 
units later?” However, questions relating time and data using parameters become 
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much harder (or even impossible) to express using existing formalisms: “what 
are the users and time frames during which a user withdraws more than half of 
the total bank withdrawals within seven days?” And even, can we synthesize the 
durations (not necessarily 7 days) for which this specification holds? Or “what 
is the set of variables for which there exists a duration within which their value 
is always copied to another variable?” In addition, detecting periodic behaviors 
without knowing the period can be hard to achieve using existing formalisms. 

In this work, we address the challenging problem to monitor logs enriched 
with both timing information and (infinite domain) data. In addition, we sig- 
nificantly push the existing limits of expressiveness so as to allow for a further 
level of abstraction using parameters: our specification can be both parametric 
in the time and in the data. The answer to this symbolic monitoring is richer 
than a pure Boolean answer, as it synthesizes the values of both time and data 
parameters for which the specification holds. This allows us notably to detect 
periodic behaviors without knowing the period while being symbolic in terms of 
data. For example, we can synthesize variable names (data) and delays for which 
variables will have their value copied to another data within the aforementioned 
delay. In addition, we show that we can detect the log segments (start and end 
date) for which a specification holds. 


Example 1. Consider a system updating three variables a, b and c (i. e., strings) 
to values (rationals). An example of log is given in Fig. la. Although our work 
is event-based, we can give a graphical representation similar to that of signals 
in Fig. 1b. Consider the following property: “for any variable px, whenever an 
update of that variable occurs, then within strictly less than tp time units, the 
value of variable b must be equal to that update”. The variable parameter px is 
compared with string values and the timing parameter tp is used in the timing 
constraints. We are interested in checking for which values of px and tp this 
property is violated. This can be seen as a synthesis problem in both the variable 
and timing parameters. For example, px = c and tp = 1.5 is a violation of 
the specification, as the update of c to 2 at time 4 is not propagated to b 
within 1.5 time unit. Our algorithm outputs such violation by a constraint e.g., 
px = cA tp < 2. In contrast, the value of any signal at any time is always such 
that either b is equal to that signal, or the value of b will be equal to that value 
within at most 2 time units. Thus, the specification holds for any valuation of 
the variable parameter px, provided tp > 2. 


We propose an automata-based approach to perform monitoring parametric 
in both time and data. We implement our work in a prototype SYMON and 
perform experiments showing that, while our formalism allows for high expres- 
siveness, it is also tractable even for online monitoring. 

We believe our framework balances expressiveness and monitoring perfor- 
mance well: (i) Regarding expressiveness, comparison with the existing work is 
summarized in Table1 (see Sect.2 for further details). (ii) Our monitoring is 
complete, in the sense that it returns a symbolic constraint characterizing all 
the parameter valuations that match a given specification. (ii) We also achieve 
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Table 1. Comparison of monitoring expressiveness 


Work [7]| [18] | [14] | [13] | [80] | [26] |[4]] [9] |This work 
Timing parameters WA E ig Z ? z BVA J 
Data TASES A 
Parametric data WA J A Sa Y 
Memory a VI yy VIV AR J 
Aggregation STs A J J x es J 
Complete parameter identification] V |N/A|V/ |V/ |N/A|N/A]/| J we 
@0 update(a,0) @4 update(c,2) Se le 
@1 update(c,1) @5 update(a,2) i op 
@2 update(a,0) @6 update(b,2) ro a 
@3 update(b,1) @7 update(c,3) 
@4 update(b,0) @9 update(b,3) 
6 78 9 t 
(a) Log (b) Graphical representation 
update(z, v) update(z, v) update(z, v) 
c#b 2 = px e<tp 
valb £ v TEB 


update(b, v) 
c<tp 


v # valu 


C= 0, valx := vu 


update(b, v) 
v= vala 


update(b, v) update(z, v) c<tp 


valb := v T = px 
valb = v 


(c) Monitoring PTDA 


Fig. 1. Monitoring copy to b within tp time units 


reasonable monitoring speed, especially given the degree of parametrization in 
our formalism. Note that it is not easy to formally claim superiority in expres- 
siveness: proofs would require arguments such as the pumping lemma; and such 
formal comparison does not seem to be a concern of the existing work. More- 
over, such formal comparison bears little importance for industrial practitioners: 
expressivity via an elaborate encoding is hardly of practical use. We also note 
that, in the existing work, we often observe gaps between the formalism in a 
theory and the formalism that the resulting tool actually accepts. This is not 
the case with the current framework. 


Outline. After discussing related works in Sect.2, we introduce the necessary 
preliminaries in Sect. 3, and our parametric timed data automata in Sect. 4. We 
present our symbolic monitoring approach in Sect. 5 and conduct experiments 
in Sect. 6. We conclude in Sect. 7. 
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2 Related Works 


Robustness and Monitoring. Robust (or quantitative) monitoring extends the 
binary question whether a log satisfies a specification by asking “by how much” 
the specification is satisfied. The quantification of the distance between a sig- 
nal and a signal temporal logic (STL) specification has been addressed in, e.g., 
(20-23, 25, 27] (or in a slightly different setting in [5]). The distance can be under- 
stood in terms of space (“signals”) or time. In [6], the distance also copes for 
reordering of events. In [10], the robust pattern matching problem is considered 
over signal regular expressions, by quantifying the distance between the signal 
regular expression specification and the segments of the signal. For piecewise- 
constant and piecewise-linear signals, the problem can be effectively solved using 
a finite union of convex polyhedra. While our framework does not fit in robust 
monitoring, we can simulate both the robustness w.r.t. time (using timing param- 
eters) and w.r.t. data, e.g., signal values (using data parameters). 


Monitoring with Data. The tool MARQ [30] performs monitoring using Quanti- 
fied Event Automata (QEA) [12]. This approach and ours share the automata- 
based framework, the ability to express some first-order properties using “events 
containing data” (which we encode using local variables associated with actions), 
and data may be quantified. However, [30] does not seem to natively support 
specification parametric in time; in addition, [30] does not perform complete 
(“symbolic”) parameters synthesis, but outputs the violating entries of the log. 

The metric first order temporal logic (MFOTL) allows for a high expressive- 
ness by allowing universal and existential quantification over data—which can 
be seen as a way to express parameters. A monitoring algorithm is presented for 
a safety fragment of MFOTL in [14]. Aggregation operators are added in [13], 
allowing to compute sums or maximums over data. A fragment of this logics is 
implemented in MONPoty [15]. While these works are highly expressive, they 
do not natively consider timing parameters; in addition, MONPOLY does not 
output symbolic answers, i.e., symbolic conditions on the parameters to ensure 
validity of the formula. 

n [26], binary decision diagrams (BDDs) are used to symbolically repre- 
sent the observed data in QTL. This can be seen as monitoring data against 
a parametric specification, with a symbolic internal encoding. However, their 
implementation DEJAVU only outputs concrete answers. In contrast, we are 
able to provide symbolic answers (both in timing and data parameters), e.g., in 
the form of union of polyhedra for rationals, and unions of string constraints 
using equalities (=) and inequalities (Æ). 


Freeze Operator. In [18], STL is extended with a freeze operator that can 
“remember” the value of a signal, to compare it to a later value of the same 
signal. This logic STL* can express properties such as “In the initial 10s, x 
copies the values of y within a delay of 4s”: Gjo.10) * (Gyo,4jy* = x). While the 
setting is somehow different (STL* operates over signals while we operate over 
timed data words), the requirements such as the one above can easily be encoded 


524 M. Waga et al. 


in our framework. In addition, we are able to synthesize the delay within which 
the values are always copied, as in Example 1. In contrast, it is not possible to 
determine using STL* which variables and which delays violate the specification. 


Monitoring with Parameters. In [7], a log in the form of a dense-time real-valued 
signal is tested against a parameterized extension of STL, where parameters can 
be used to model uncertainty both in signal values and in timing values. The 
output comes in the form of a subset of the parameters space for which the 
formula holds on the log. In [9], the focus is only on signal parameters, with an 
improved efficiency by reusing techniques from the robust monitoring. Whereas 
[7,9] fit in the framework of signals and temporal logics while we fit in words and 
automata, our work shares similarities with [7,9] in the sense that we can express 
data parameters; in addition, [9] is able as in our work to exhibit the segment 
of the log associated with the parameters valuations for which the specification 
holds. A main difference however is that we can use memory and aggregation, 
thanks to arithmetic on variables. 

In [24], the problem of inferring temporal logic formulae with constraints 
that hold in a given numerical data time series is addressed. 


Timed Pattern Matching. A recent line of work is that of timed pattern match- 
ing, that takes as input a log and a specification, and decides where in the log 
the specification is satisfied or violated. On the one hand, a line of works con- 
siders signals, with specifications either in the form of timed regular expressions 
[11,31-33], or a temporal logic [34]. On the other hand, a line of works considers 
timed words, with specifications in the form of timed automata [4,36]. We will 
see that our work can also encode parametric timed pattern matching. There- 
fore, our work can be seen as a two-dimensional extension of both lines of works: 
first, we add timing parameters ({4] also considers similar timing parameters) 
and, second, we add data—themselves extended with parameters. That is, com- 
ing back to Example 1, [31-33,36] could only infer the segments of the log for 
which the property is violated for a given (fixed) variable and a given (fixed) 
timing parameter; while [4] could infer both the segments of the log and the 
timing parameter valuations, but not which variable violates the specification. 


Summary. We compare related works in Table1. “Timing parameters” denote 
the ability to synthesize unknown constants used in timing constraints (e.g., 
modalities intervals, or clock constraints). “?” denotes works not natively sup- 
porting this, although it might be encoded. The term “Data” refers to the ability 
to manage logs over infinite domains (apart from timestamps). For example, the 
log in Fig. la features, beyond timestamps, both string (variable name) and 
rationals (value). Also, works based on real-valued signals are naturally able to 
manage (at least one type of) data. “Parametric data” refer to the ability to 
express formulas where data (including signal values) are compared to (quan- 
tified or unquantified) variables or unknown parameters; for example, in the 
log in Fig. la, an example of property parametric in data is to synthesize the 
parameters for which the difference of values between two consecutive updates of 
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variable px is always below pv, where px is a string parameter and pv a rational- 
valued parameter. “Memory” is the ability to remember past data; this can be 
achieved using e.g., the freeze operator of STL”, or variables (e.g., in [14, 26, 30}). 
“Ageregation” is the ability to aggregate data using operators such as sum or 
maximum; this allows to express properties such as “A user must not withdraw 
more than $10,000 within a 31 day period” [13]. This can be supported using 
dedicated aggregation operators [13] or using variables ([30], and our work). 
“Complete parameter identification” denotes the synthesis of the set of param- 
eters that satisfy or violate the property. Here, “N/A” denotes the absence of 
parameter [18], or when parameters are used in a way (existentially or univer- 
sally quantified) such as the identification is not explicit (instead, the position 
of the log where the property is violated is returned [26]). In contrast, we return 
in a symbolic manner (as in [4,7]) the exact set of (data and timing) parameters 
for which a property is satisfied. “\//x” denotes “yes” in the theory paper, but 
not in the tool. 


3 Preliminaries 


Clocks, Timing Parameters and Timed Guards. We assume a set C = 
{c1,..., CH } of clocks, i.e., real-valued variables that evolve at the same rate. A 
clock valuation is v : C — Ryo. We write O for the clock valuation assigning 0 
to all clocks. Given d € Rso, v + d is s.t. (v + d)(c) = v(c) +d, for all c € C. 
Given R C C, we define the reset of a valuation v, denoted by [v]r, as follows: 
[v]r(c) = 0 if c € R, and [v]r(c) = v(c) otherwise. 

We assume a set TP = {tp,,..., tp} of timing parameters. A timing parame- 
ter valuation is y : TP > Q,. We assume ™ € {<,<,=,>,>}. A timed guard tg 
is a constraint over C U TP defined by a conjunction of inequalities of the form 
c d, or c X< tp with d € N and tp € TP. Given tg, we write v = y(tg) if the 
expression obtained by replacing each c with v(c) and each tp with y(tp) in tg 
evaluates to true. 


Variables, Data Parameters and Data Guards. For sake of simplicity, we 
assume a single infinite domain D for data. The formalism defined in Sect. 4 
can be extended in a straightforward manner to different domains for different 
variables (and our implementation does allow for different types). The case of 
finite data domain is immediate too. We define this formalism in an abstract 
manner, so as to allow a sort of parameterized domain. 

We assume a set V = {v1,..., Um } of variables valued over D. These variables 
are internal variables, that allow an high expressive power in our framework, 
as they can be compared or updated to other variables or parameters. We also 
assume a set LV = {lv1,..., luo} of local variables valued over D. These variables 
will only be used locally along a transition in the “argument” of the action (e.g., 
x and v in upate(a,v)), and in the associated guard and (right-hand part of) 
updates. We assume a set VP = {vp,,...,vpa} of data parameters, i. e., unknown 
variable constants. 
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A data type (D, DE, DU) is made of (i) an infinite domain D, (ii) a set of 
admissible Boolean expressions DE (that may rely on V, LV and VP), which will 
define the type of guards over variables in our subsequent automata, and (iti) a 
domain for updates DU (that may rely on VY, LV and VP), which will define the 
type of updates of variables in our subsequent automata. 


Example 2. As a first example, let us define the data type for rationals. We have 
D = Q. Let us define Boolean expressions. A rational comparison is a constraint 
over V ULV U VP defined by a conjunction of inequalities of the form v & d, 
v Xv’, or v X vp with v,v’ € VULY, d € Q and vp € VP. DE is the set of all 
rational comparisons over VULVUVP. Let us then define updates. First, a linear 
arithmetic expression over V U LV UVP is $; a;v; + 8, where v; € V ULY UVP 
and a;,3 E€ Q. Let LA(V ULV U VP) denote the set of arithmetic expressions 
over V, LV and VP. We then have DU = LA(V ULV U VP). 

As a second example, let us define the data type for strings. We have D = S, 
where S denotes the set of all strings. A string comparison is a constraint over 
V ULY UVP defined by a conjunction of comparisons of the form v © s, v x v’, 
or v & vp with v,v’ € V ULY, s € S, vp € VP and = € {=,F}. DE is the set of 
all string comparisons over VULVUVP. DU = VULVUS, i.e., a string variable 
can be assigned another string variable, or a concrete string. 


A variable valuation is u : V — D. A local variable valuation is a partial 
function 7: LV + D. A data parameter valuation is ¢ : VP — D. Given a data 
guard dg € DE, a variable valuation u, a local variable valuation 7 defined for 
the local variables in dg, and a data parameter valuation Ç, we write (u,n) = 
¢(dg) if the expression obtained by replacing within dg all occurrences of each 
data parameter vp; by ¢(vp;) and all occurrences of each variable v; (resp. local 
variable lv) with its concrete valuation p(v;) (resp. n(lvk))) evaluates to true. 

A parametric data update is a partial function PDU : Y + DU. That is, we 
can assign to a variable an expression over data parameters and other variables, 
according to the data type. Given a parametric data update PDU, a variable 
valuation ju, a local variable valuation 7 (defined for all local variables appearing 
in PDU), and a data parameter valuation Ç, we define [u],(¢(ppuy) : V > D as: 


[ulne (v) = if PDU(v) is undefined 
FER MCP DU oieri 


where 7(~(¢(PDU(v)))) denotes the replacement within the update expression 
PDU(v) of all occurrences of each data parameter vp; by ¢(vp;), and all occur- 


Table 2. Variables, parameters and valuations used in guards 


Timed guards Data guards 


Clock | Timing parameter | (Data) variable | Local variable | Data parameter 


Variable |c tp v lv vp 


Valuation | v y H n a 
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rences of each variable v; (resp. local variable lv) with its concrete valuation 
p(v;) (resp. n(lvk)). Observe that this replacement gives a value in D, therefore 
the result of [u] (Pou) is indeed a data parameter valuation V — D. That 
is, [u]n(¢(ppuy) Computes the new (non-parametric) variable valuation obtained 
after applying to u the partial function PDU valuated with ¢. 


Example 3. Consider the data type for rationals, the variables set {v1, v2}, the 
local variables set {lv1,lv2} and the parameters set {vp,}. Let u be the variable 
valuation such that u(vı) = 1 and (v2) = 2, and ņ be the local variable valuation 
such that n(lv,) = 2 and n(lv2) is not defined. Let ¢ be the data parameter valu- 
ation such that ¢(vp,) = 1. Consider the parametric data update function PDU 
such that PDU(v1) = 2 x vı + v2 — lv; +vp,, and PDU(v2) is undefined. Then the 
result of [f]n(¢(ppuy) is w’ such that p (v1) = 2x w(v1)+ (v2) —n(lv1) +C(vp1) = 3 
and pi’(v2) = 2. 


4 Parametric Timed Data Automata 


We introduce here Parametric timed data automata (PTDAs). They can be 
seen as an extension of parametric timed automata [2] (that extend timed 
automata [1] with parameters in place of integer constants) with unbounded 
data variables and parametric variables. PTDAs can also be seen as an exten- 
sion of some extensions of timed automata with data (see e.g., [16,19,29]), that 
we again extend with both data parameters and timing parameters. Or as an 
extension of quantified event automata [12] with explicit time representation 
using clocks, and further augmented with timing parameters. PTDAs feature 
both timed guards and data guards; we summarize the various variables and 
parameters types together with their notations in Table 2. 

We will associate local variables with actions (which can be see as predicates). 
Let Dom : X — 2L” denote the set of local variables associated with each 
action. Let Var(dg) (resp. Var(PDU)) denote the set of variables occurring in dg 
(resp. PDU). 


Definition 1 (PTDA). Given a data type (D,DE,DU), a parametric timed 
data automaton (PTDA) A over this data type is a tuple A = (X, L, 0, F,C, 
TP, Y, LV, uo, VP, E), where: 


BY 


. X is a finite set of actions, 

. Lis a finite set of locations, lo € L is the initial location, 

. FC L is the set of accepting locations, 

. C is a finite set of clocks, 

TP is a finite set of timing parameters, 

V (resp. LY) is a finite set of variables (resp. local variables) over D, 
. po is the initial variable valuation, 

VP is a finite set of data parameters, 


w 69 


NN 


SRAN 
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9. E is a finite set of edges e = (€,tg,dg,a,R,PDU, V) where (i) 0,0’ € L are 
the source and target locations, (ii) tg is a timed guard, (iii) dg E€ DE is a 
data guard such as Var(dg) OLY C Dom(a), (iv) a € X, (w) RCC is a set 
of clocks to be reset, and (vi) PDU : V + DU is the parametric data update 
function such that Var(PDU) N LY C Dom(a). 


The domain conditions on dg and PDU ensure that the local variables used 
in the guard (resp. update) are only those in the action signature Dom(a). 


open(f,m) 
f=vp 


open(f,m) c:=0 open(f,m) open(f, m) 


f#vp f#vp ce 
close( f) close( f) 

A f=vp f=vp 
cS c> tp 


1|@2046 open(Hakuchi.txt ,rw) close( f) 
@2136 open(Unagi.mp4,rw) f #vp 
@2166 close (HĦHakuchi. txt) : 


close(f) 
f=vp 


(a) Example of log (b) PTDA monitor 


Fig. 2. Monitoring proper file opening and closing 


Example 4. Consider the PTDA in Fig.2b over the data type for strings. We 
have C = {c}, TP = {tp}, V = Ú and LV = {f,m}. Dom(open) = {f,m} while 
Dom(close) = {f}. 2 is the only accepting location, modeling the violation of 
the specification. 

This PTDA (freely inspired by a formula from [26] further extended with 
timing parameters) monitors the improper file opening and closing, i.e., a file 
already open should not be open again, and a file that is open should not be 
closed too late. The data parameter vp is used to symbolically monitor a given 
file name, i.e., we are interested in opening and closings of this file only, while 
other files are disregarded (specified using the self-loops in Zo and 4, with data 
guard f # vp). Whenever f is opened (transition from lọ to 41), a clock c is 
reset. Then, in 4, if f is closed within tp time units (timed guard “c < tp”), 
then the system goes back to lo. However, if instead f is opened again, this is an 
incorrect behavior and the system enters lə via the upper transition. The same 
occurs if f is closed more than tp time units after opening. 


Given a data parameter valuation Ç and a timing parameter valuation y, 
we denote by y|¢(A) the resulting timed data automaton (TDA), i.e., the non- 
parametric structure where all occurrences of a parameter vp; (resp. tp;) have 
been replaced by ¢(vp;) (resp. y(tp;)). Note that, if V = LV = Ø, then A is a 
parametric timed automaton [2] and y|Ç(A) is a timed automaton [1]. 

We now equip our TDAs with a concrete semantics. 
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Definition 2 (Semantics of a TDA). Given a PTDA A = (2,1, ,F, 
C, TP, V, LV, uo, VP, E) over a data type (D, DE, DU), a data parameter valu- 
ation C and a timing parameter valuation y, the semantics of y|C(A) is given by 
the timed transition system (TTS) (S,50,—), with 

-S=L*x D™M x Rio so = (lo, Ho, 0), 

- — consists of the discrete and (continuous) delay transition relations: 


1. discrete transitions: (L, u,v) 3 (V, u,v"), there exist e = (0,tg,dg,a, 
R,PDU,@’) € E and a local variable valuation n defined exactly for Dom(a), 
such that v = (tg), (m,n) = 6(dg), v' = [v]r, and w = [ulne 

2. delay transitions: (L, u,v) £ (£, u,v + d), with d € Rso. 


Moreover we write ((€, u,v), (e,n, d), (U, u’,u’)) € —> for a combination of a 


delay and discrete transition if 3v” : (4, u,v) s (L, u, v") ÈÌ (u,v). 

Given a TDA g|¢(A) with concrete semantics (S,s9,—), we refer to 
the states of S as the concrete states of y|¢(A). A run of y|¢(A) is 
an alternating sequence of concrete states of y|¢(A) and triples of edges, 
local variable valuations and delays, starting from the initial state sọ of 
the form (lo, Ho, vo), (€0, 7, do), (41, 1,1), °° with i = 0,1,..., en € E, 
d; € R>o and ((Li, Mis Vi), (€i, Ni, di), (Cin, Hi+1, Vi+1)) E€ —. Given such 
a run, the associated timed data word is (a1, T1, N), (a2, T2;N2),*--, where 
a; is the action of edge ei—-1, 7; is the local variable valuation associ- 
ated with that transition, and 7; = J o<j<i-1 dj, for i = 1,2---. For 
a timed data word w and a concrete state (l, u,v) of y|¢(A), we write 
(lo, uo,0) = (£, u,v) in 7\C(A) if w is associated with a run of y|¢(A) of 
the form (lo, Ho, 0),..., (Cn, Un, Vn) with (Cn, Hn, Yn) = (£ u,v). For a timed 


data word w = (a1, T1, q), (a2, T2;N2);---, (an, Tn; 1m), we denote |w| = n 
and for any i € {1,2,...,n}, we denote w(1,i) = (a1, T1, 71), (a2, T2,N2);, <+, 
(ai, Ti, ni). 


A finite run is accepting if its last state (£, u,v) is such that £ € F. The 
language L(y|¢(A)) is defined to be the set of timed data words associated with 
all accepting runs of y|¢(A). 


Example 5. Consider the PTDA in Fig. 2b over the data type for strings. Let 
(tp) = 100 and ¢(vp) = Hakuchi.txt. An accepting run of the TDA 4|¢(A) 
is: (lo, 0, vo), (e0, no, 2046), (41, 0, v), (e1, M15 90), (G, 0, V2)(e2, n2, 30), (Lo, 0, V3), 
where Ø denotes a variable valuation over an empty domain (recall that V = 0 
in Fig. 2b), vo(c) = 0, m(c) = 0, ve(c) = 90, v3(c) = 120, eo is the upper edge 
from fp to 41, e1 is the self-loop above £1, e2 is the lower edge from ¢; to 42, 
no(f) = no(f) = Hakuchi.txt, m(f) = Unagi.mp4, 7o(m) = m(m) = rw, and 
n2(m) is undefined (because Dom(close) = {f }). 

The associated timed data word is (open, 2046, no), (open, 2136, 71), 
(close, 2166, 72). 

Since each action is associated with a set of local variables, given an ordering 
on this set, it is possible to see a given action and a variable valuation as a pred- 
icate: for example, assuming an ordering of LV such as f precedes m, then open 
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with 79 can be represented as open(Hakuchi.txt, rw). Using this convention, the 
log in Fig. 2a corresponds exactly to this timed data word. 


5 Symbolic Monitoring Against PTDA Specifications 


In symbolic monitoring, in addition to the (observable) actions in X, we employ 
unobservable actions denoted by £ and satisfying Dom(e) = Ø. We write Xe 
for XU {e}. We let 7- be the local variable valuation such that 7-(lv) is unde- 
fined for any lv € LY. For a timed data word w = (a1, T1, M), (@2,72,72),---5 
(Qn,T;%m) over Xs, the projection wl» is the timed data word over X 
obtained from w by removing any triple (a;,7;,7;) where a; = £. An edge 
e = (£,tg,dg,a,R,PDU,@) € E is unobservable if a = £, and observable oth- 
erwise. The use of unobservable actions allows us to encode parametric timed 
pattern matching (see Sect. 5.3). 
We make the following assumption on the PTDAs in symbolic monitoring. 


Assumption 1. The PTDA A does not contain any loop of unobservable edges. 


5.1 Problem Definition 


Roughly speaking, given a PTDA A and a timed data word w, the symbolic 
monitoring problem asks for the set of pairs (y,¢) € (Q4) x DY” satisfying 
w(1,i) E€ y/¢(A), where w(1,i) is a prefix of w. Since A also contains unobserv- 
able edges, we consider w’ which is w augmented by unobservable actions. 


Symbolic monitoring problem: 

INPUT: a PTDA A over a data type (D, DE, DU) and actions X:, and a 
timed data word w over X 

PROBLEM: compute all the pairs (y, Ç) of timing and data parameter valua- 
tions such that there is a timed data word w’ over X, and i € {1,2,...,|w’|} 
satisfying w’|s = w and w’(1,i) € L(9|¢(A)). That is, it requires the 
validity domain D(w,A) = {(7,¢) | dw’ : i € {1,2,...,|w’|},w'ls = 
w and w’(1,2) € L(y|¢(A))}. 


Example 6. Consider the PTDA A and the timed data word w shown in Fig. 1. 
The validity domain D(w, A) is D(w, A) = Dı U D2, where 


Dı = { (7,6) | 0 < q(tp) < 2,¢(xp) = c} and D2 = {(7,¢) |0 < y(tp) < 1,¢(xp) = a}. 


For w’ = w(1,3) - (£, ne,2.9), we have w’ € L(y|¢(A)) and w'ls = w(1,3), 
where y and ¢ are such that y(tp) = 1.8 and ¢(xp) = c, and w(1,3)- (€, ne, 2.9) 
denotes the juxtaposition. 


For the data types in Example 2, the validity domain D(w, A) can be rep- 
resented by a constraint of finite size because the length |w| of the timed data 
word is finite. 
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5.2 Online Algorithm 


Our algorithm is online in the sense that it outputs (y, ¢) € D(w, A) as soon as 
its membership is witnessed, even before reading the whole timed data word w. 

Let w = (a1, T1, m), (a2, T2,2),--- (an, Tn; Mn) and A be the timed data word 
and PTDA given in symbolic monitoring, respectively. Intuitively, after reading 


(ai, Ti ni), our algorithm symbolically computes for all parameter valuations 


(7,0) € (Q,)™ x DP the concrete states (4, v, p) satisfying (lo, uo, 0) raw, 


(£, u,v) in y|¢(A). Since A has unobservable edges as well as observable edges, 
we have to add unobservable actions before or after observable actions in w. By 
Conf, we denote the configurations after reading (a;, Ti, mi) and no unobservable 
actions are appended after (ai, Ti, ni). By Confr, we denote the configurations 
after reading (a;i, Ti, Ni) and at least one unobservable action is appended after 
(ai, Ti, ni). 


Definition 3 (Conf?, Conf). For a PTDA A over actions Xz, a timed data 
word w over X, and i € {0,1,...,|w|} (resp. i € {-1,0,...,|w]}), Conf? 
(resp. Conf;') is the set of 5-tuples (£, v, y, u,¢) such that there is a timed data 


word w over Xs satisfying the following: (i) (£o, uo,0) “> (£, p, v) in y\C(A), 
(ü) ws = w(1,i), (iii) The last action alw’ of w is observable (resp. unob- 
servable and its timestamp is less than Ti+1). 


Algorithm 1. Outline of our algorithm for symbolic monitoring 
Input: A PTDA A = (2%, L, lo, F,C, TP, V, LV, po, VP, E) over a data 
type (D, DE, DU) and actions X<, and a timed data 
word w = (a1, 71,71), (a2, 72; 72),+++;(Gn; Tn, Mn) over X 
Output: Ujer 2, n4i; Result: is the validity domain D(w, A) 
Conf, — 0; Conf§ — {(l0,0,7, 40,6) | y € (Q+)™,¢ € DH} 
for i — 1 to n do 
compute (Conf_ı, Conf?) from (Conf¥_ >, Conf?_,) 
Result; — {(7,¢) | I(£, v, y, u, 0) € Conf, U Conf?.t € F} 
compute Conf% from (Conf; ,, Conf?) 
Resultng1 — {(7,0) | (£, v, y, u, C) E Conf,.£ E F} 


ont bwWwN rR 


Algorithm 1 shows an outline of our algorithm for symbolic monitoring 
(see [35] for the full version). Our algorithm incrementally computes Conf ;'_, and 
Conf? (line 3). After reading (ai, Ti, ni), our algorithm stores the partial results 
(y,¢) € D(w,A) witnessed from the accepting configurations in Conf;'_, and 
Conf? (line 4). (We also need to try to take potential unobservable transitions 
and store the results from the accepting configurations after the last element of 
the timed data word (lines 5 and 6).) 

Since (Q,)™ xD is an infinite set, we cannot try each (y, C) € (Q,)™ x DYP 
and we use a symbolic representation for parameter valuations. Similarly to the 
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reachability synthesis of parametric timed automata [28], a set of clock and tim- 
ing parameter valuations can be represented by a convex polyhedron. For variable 
valuations and data parameter valuations, we need an appropriate representa- 
tion depending on the data type (D, DE, DU). Moreover, for the termination of 
Algorithm 1, some operations on the symbolic representation are required. 


Theorem 1 (termination). For any PTDA A over a data type (D, DE, DU) 
and actions X., and for any timed data word w over X, Algorithm 1 terminates 
if the following operations on the symbolic representation Vq of a set of variable 
and data parameter valuations terminate. 


1. restriction and update {([t]n(c(ppuy)>%) | alu, $) € Va. (u,n) H| C(dg)}, where 
7 is a local variable valuation, PDU is a parametric data update function, and 
dg is a data guard; 

2. emptiness checking of Va; 

3. projection Valyp of Va to the data parameters VP. 


Example 7. For the data type for rationals in Example 2, variable and data 
parameter valuations V4 can be represented by convex polyhedra and the above 
operations terminate. For the data type for strings S in Example 2, variable and 
data parameter valuations Vz can be represented by SIY! x (SU Pgn(S))!V"! and 
the above operations terminate, where Pgy(S) is the set of finite sets of S. 


withdraw(n,a), vp =n 


c— tp; < 100 - 
= vı ‘= v, Fa c=tp . 
A 5 2 ; ee 
withdraw(n, a) c=tp, %2:=%+4 6 tp, € (50,100) withdraw(a) 


withdraw(a) a> vp 
1 = 0, v2 :=0 2w > v © i 
O Z fy (O) AED © ii a " 


withdraw(n,a), vp £ n 
c—tp, < 100, v2 := v2 +a 


Fig. 3. PTDAs in DOMINANT (left) and PERIODIC (right) 


5.3 Encoding Parametric Timed Pattern Matching 


The symbolic monitoring problem is a generalization of the parametric timed 
pattern matching problem of [4]. Recall that parametric timed pattern matching 
aims at synthesizing timing parameter valuations and start and end times in the 
log for which a log segment satisfies or violates a specification. In our approach, 
by adding a clock measuring the absolute time, and two timing parameters 
encoding respectively the start and end date of the segment, one can easily infer 
the log segments for which the property is satisfied. 

Consider the DOMINANT PTDA (left of Fig. 3). It is inspired by a mon- 
itoring of withdrawals from bank accounts of various users [15]. This PTDA 
monitors situations when a user withdraws more than half of the total with- 
drawals within a time window of (50,100). The actions are X = {withdraw} 
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and Dom(withdraw) = {n,a}, where n has a string value and a has an inte- 
ger value. The string n represents a user name and the integer a represents the 
amount of the withdrawal by the user n. Observe that clock c is never reset, 
and therefore measures absolute time. The automaton can non-deterministically 
remain in lọ, or start to measure a log by taking the e-transition to ¢; checking 
c = tp,, and therefore “remembering” the start time using timing parameter tp,. 
Then, whenever a user vp has withdrawn more than half of the accumulated 
withdrawals (data guard 2v; > və) in a (50,100) time window (timed guard 
c—tp, € (50, 100)), the automaton takes a e-transition to the accepting loca- 
tion, checking c = tp», and therefore remembering the end time using timing 
parameter tpv. 


6 Experiments 


We implemented our symbolic monitoring algorithm in a tool SYMON in C++, 
where the domain for data is the strings and the integers. Our tool SYMON 
is distributed at https: //github.com/MasWag/symon. We use PPL [8] for the 
symbolic representation of the valuations. We note that we employ an optimiza- 
tion to merge adjacent polyhedra in the configurations if possible. We evaluated 
our monitor algorithm against three original benchmarks: Copy in Fig. 1c; and 
DOMINANT and PERIODIC in Fig. 3. We conducted experiments on an Amazon 
EC2 c4.large instance (2.9 GHz Intel Xeon E5-2666 v3, 2 vCPUs, and 3.75 GiB 
RAM) that runs Ubuntu 18.04 LTS (64 bit). 


6.1 Benchmark 1: Copy 


Our first benchmark Copy is a monitoring of variable updates much like the 
scenario in [18]. The actions are X = {update} and Dom(update) = {n, v}, 
where n has a string value representing the name of the updated variables and 
v has an integer value representing the updated value. Our set consists of 10 
timed data words of length 4,000 to 40,000. 

The PTDA in Copy is shown in Fig. 1c, where we give an additional con- 
straint 3 < tp < 10 on tp. The property encoded in Fig. 1c is “for any variable px, 
whenever an update of that variable occurs, then within tp time units, the value 
of b must be equal to that update”. 

The experiment result is in Fig. 4. We observe that the execution time is linear 
to the number of the events and the memory usage is more or less constant with 
respect to the number of events. 


6.2 Benchmark 2: Dominant 


Our second benchmark is DOMINANT (Fig.3 left). Our set consists of 10 timed 
data words of length 2,000 to 20,000. The experiment result is in Fig.5. We 
observe that the execution time is linear to the number of the events and the 
memory usage is more or less constant with respect to the number of events. 
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Fig. 5. Execution time (left) and memory usage (right) of DOMINANT and PERIODIC 


6.3 Benchmark 3: Periodic 


Our third benchmark PERIODIC is inspired by a parameter identification of peri- 
odic withdrawals from one bank account. The actions are X = {withdraw} and 
Dom(withdraw) = {a}, where a has an integer value representing the amount of 
the withdrawal. We randomly generated a set consisting of 10 timed data words 
of length 2,000 to 20,000. Each timed data word consists of the following three 
kinds of periodic withdrawals: 


right of Fig. 3. The PTDA matches situations 
where, for any two successive withdrawals of 
amount more than vp, the duration between 
them is within [tp,,tp.]. By the symbolic 
monitoring, one can identify the period of the 


shortperiod One withdrawal occurs every 5 4 
the withdrawal is 50 + 3. 
middleperiod One withdrawal occurs every 50 + 3 time units. The amount 
of the withdrawal is 1000 + 40. 
longperiod One withdrawal occurs every 100 +5 time units. The amount of 
the withdrawal is 5000 + 20. 


The PTDA in PERIODIC is shown in the 
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periodic withdrawals of amount greater than 
vp is in [tp,,tp.]. An example of the validity 
domain is shown in the right figure. 

The experiment result is in Fig. 5. We observe that the execution time is linear 
to the number of the events and the memory usage is more or less constant with 
respect to the number of events. 


6.4 Discussion 


First, a positive result is that our algorithm effectively performs symbolic mon- 
itoring on more than 10,000 actions in one or two minutes even though the 
PTDAs feature both timing and data parameters. The execution time in COPY 
is 50-100 times smaller than that in DOMINANT and PERIODIC. This is because 
the constraint 3 < tp < 10 in Copy is strict and the size of the configurations 
(i.e., Conf? and Conf} in Algorithm 1) is small. Another positive result is that 
in all of the benchmarks, the execution time is linear and the memory usage is 
more or less constant in the size of the input word. This is because the size of 
configurations (i.e., Conf? and Conf;’ in Algorithm 1) is bounded due to the 
following reason. In DOMINANT, the loop in 44 of the PTDA is deterministic, and 
because of the guard c—tp, € (50,100) in the edge from £; to 2, the number of 
the loop edges at 4 in an accepting run is bounded (if the duration between two 
continuing actions are bounded as in the current setting). Therefore, |Conf?| 
and |Conf;'| in Algorithm 1 are bounded. The reason is similar in Copy, too. 
In PERIODIC, since the PTDA is deterministic and the valuations of the amount 
of the withdrawals are in finite number, |Conf?| and |Conf;'| in Algorithm 1 are 
bounded. 

It is clear that we can design ad-hoc automata for which the execution time 
of symbolic monitoring can grow much faster (e.g., exponential in the size of 
input word). However, experiments showed that our algorithm monitors various 
interesting properties in a reasonable time. 

Copy and DOMINANT use data and timing parameters as well as memory 
and aggregation; from Table 1, no other monitoring tool can compute the valua- 
tions satisfying the specification. We however used the parametric timed model 
checker IMITATOR [3] to try to perform such a synthesis, by encoding the input 
log as a separate automaton; but IMITATOR ran out of memory (on a 3.75 GiB 
RAM computer) for DOMINANT with |w| = 2000, while SYMON terminates in 
14s with only 6.9 MiB for the same benchmark. Concerning PERIODIC, the only 
existing work that can possibly accommodate this specification is [7]. While the 
precise performance comparison is interesting future work (their implementation 
is not publicly available), we do not expect our implementation be vastly out- 
performed: in [7], their tool times out (after 10min) for a simple specification 
(“Ejo,s2]Gio,s,](@ < p)”) and a signal discretized by only 128 points. 

For those problem instances which MONPOLy and DEJAVU can accommo- 
date (which are simpler and less parametrized than our benchmarks), they tend 
to run much faster than ours. For example, in [26], it is reported that they can 
process a trace of length 1,100,004 in 30.3 s. The trade-off here is expressivity: for 
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example, DEJAVU does not seem to accommodate DOMINANT, because DEJAVU 
does not allow for aggregation. We also note that, while SYMON can be slower 
than MONPOLY and DEJAVU, it is fast enough for many scenarios of real-world 
online monitoring. 


7 Conclusion and Perspectives 


We proposed a symbolic framework for monitoring using parameters both in data 
and time. Logs can use timestamps and infinite domain data, while our monitor 
automata can use timing and variable parameters (in addition to clocks and 
local variables). In addition, our online algorithm can answer symbolically, by 
outputting all valuations (and possibly log segments) for which the specification 
is satisfied or violated. We implemented our approach into a prototype SYMON 
and experiments showed that our tool can effectively monitor logs of dozens of 
thousands of events in a short time. 


Perspectives. Combining the BDDs used in [26] with some of our data types 
(typically strings) could improve our approach by making it even more symbolic. 
Also, taking advantage of the polarity of some parameters (typically the timing 
parameters, in the line of [17]) could improve further the efficiency. 

We considered infinite domains, but the case of finite domains raises inter- 
esting questions concerning result representation: if the answer to a property is 
“neither a nor b”, knowing the domain is {a,b,c}, then the answer should be c. 
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Abstract. Stochastic model checking is a technique for analyzing systems that 
possess probabilistic characteristics. However, its scalability is limited as proba- 
bilistic models of real-world applications typically have very large or infinite state 
space. This paper presents a new infinite state CTMC model checker, STAMINA, 
with improved scalability. It uses a novel state space approximation method to 
reduce large and possibly infinite state CTMC models to finite state representa- 
tions that are amenable to existing stochastic model checkers. It is integrated with 
a new property-guided state expansion approach that improves the analysis accu- 
racy. Demonstration of the tool on several benchmark examples shows promising 
results in terms of analysis efficiency and accuracy compared with a state-of-the- 
art CTMC model checker that deploys a similar approximation method. 


Keywords: Stochastic model checking - Infinite-state - Markov chains 


1 Introduction 


Stochastic model checking is a formal method that designers and engineers can use to 
determine the likelihood of safety and liveness properties. Checking properties using 
numerical model checking techniques requires enumerating the state space of the sys- 
tem to determine the probability that the system is in any given state at a desired 
time [17]. Real-world applications often have very large or even infinite state spaces. 
Numerous state representation, reduction, and approximation methods have been 
proposed. Symbolic model checking based on multi-terminal binary decision diagrams 
(MTBDDs) [23] has achieved success in representing large Markov Decision Process 
(MDP) models with a few distinct probabilistic choices at each state, e.g., the shared 
coin protocol [3]. MTBDDs, however, are often inefficient for models with many differ- 
ent and distinct probability/rate values due to the inefficient representation of solution 
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vectors. Continuous-time Markov chain (CTMC) models, whose state transition rate is a 
function of state variables, generally contain many distinct rate values. As a result, sym- 
bolic model checkers can run out of memory while verifying a typical CTMC model 
with as few as 73,000 states [23]. State reduction techniques, such as bisimulation min- 
imization [7,8,14], abstraction [6, 12, 14,20], symmetry reduction [5,16], and partial 
order reduction [9] have been mainly extended to discrete-time, finite-state probabilis- 
tic systems. The three-valued abstraction [14] can reduce large, finite-state CTMCs. It 
may, however, provide inconclusive verification results due to abstraction. 

To the best of our knowledge, only a few tools can analyze infinite-state probabilistic 
models, namely, STAR [19] and INFAMY [10]. The STAR tool primarily analyzes bio- 
chemical reaction networks. It approximates solutions to the chemical master equation 
(CME) using the method of conditional moments (MCM) [11] that combines moment- 
based and state-based representations of probability distributions. This hybrid approach 
represents species with low concentrations using a discrete stochastic description and 
numerically integrates a small master equation using the fourth order Runge-Kutta 
method over a small time interval [2]; and solves a system of conditional moment equa- 
tions for higher concentration species, conditioned on the low concentration species. 
This method has been optimized to drop unlikely states and add likely states on-the-fly. 
STAR relies on a well-structured underlying Markov process with small sensitivity on 
the transient distribution. Also, it mainly reports state reachability probabilities, instead 
of checking a given probabilistic property. INFAMY is a truncation-based approach that 
explores the model’s state space up to a certain finite depth k. The truncated state space 
still grows exponentially with respect to exploration depth. Starting from the initial 
state, breadth-first state search is performed up to a certain finite depth. The error prob- 
ability computed during the model checking depends on the depth of state exploration. 
Therefore, higher exploration depth generally incurs lower error probability. 

This paper presents a new infinite-state stochastic model checker, STochastic 
Approximate Model-checker for INfinite-state Analysis (STAMINA). Our tool also takes 
a truncation-based approach. In particular, it maintains a probability estimate of each 
path being explored in the state space, and when the currently explored path probabil- 
ity drops below a specified threshold, it halts exploration of this path. All transitions 
exiting this state are redirected to an absorbing state. After all paths have been explored 
or truncated, transient Markov chain analysis is applied to determine the probability of 
a transient property of interest specified using Continuous Stochastic Logic (CSL) [4]. 
The calculated probability forms a lower bound on the probability, while the upper 
bound also includes the probability of the absorbing state. The actual probability of the 
CSL property is guaranteed to be within this range. An initial version of our tool and 
preliminary results are reported in [22]. Since that paper, our tool has been tightly inte- 
grated within the PRISM model checker [18] to improve performance, and we have also 
developed a new property-guided state expansion technique to expand the state space 
to tighten the reported probability range incrementally. This paper reports our results, 
which show significant improvement on both efficiency and verification accuracy over 
several non-trivial case studies from various application domains. 
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2 STAMINA 


Figure | presents the architecture of STAMINA. Based on a user-specified probability 
threshold x (kappa), it first constructs a finite-state CTMC model C | ,, from the original 
infinite-state CTMC model C using the state space approximation method presented in 
Sect. 2.1. C |» is then checked using the PRISM explicit-state model checker against a 
given CSL property P.,,(¢), where ~E {<, >, <, >} and p € [0,1] (for cases where 
it is desired that a predicate be true within a certain probability bound) or P_»(@) (for 
cases where it is desired that the exact probability of the predicate being true be calcu- 
lated). Lower- and upper-bound probabilities that ¢ holds, namely, Pmin and Pmax, are 
then obtained, and their difference, i.e., (Prax — Pmin), is the probability accumulated 
in the absorbing state Xabs which abstracts all the states not included in the current state 
space. If p € [Pmin, Pmaz], it is not known whether P.,,(@) holds. If exact probability 
is of interest and the probability range is larger than the user-defined precision e, i.e., 
(Pmax — Pmin) > €, then the method does not give a meaningful result. 


Property- 
Guided 
Expansion 
Clee 


Clie Pap (4) CJ Pap () 


Fig. 1. Architecture of STAMINA. 


For an inconclusive verification result from the previous step, STAMINA applies 
a property-guided approach, described in Sect.2.2, to further expand C|., provided 
P.»(@) is a non-nested “until” formula; otherwise, it uses the previous method to 
expand the state space. Note that x also drops by the reduction factor x, to enable 
states that were previously ignored due to a low probability estimate to be included in 
the current state expansion. The expanded CTMC model C | ,, is then checked to obtain a 
new probability bound |Pmin, Pmazx|. This iterative process repeats until one of the fol- 
lowing conditions holds: (1) the target probability p falls outside the probability bound 
[Pmin; Pmaz|, (2) the probability bound is sufficiently small, i.e, (Prax — Pmin) < €, 
or (3) a maximal number of iterations N has been reached (r > N). 


2.1 State Space Approximation 


The state space approximation method [22] truncates the state space based on a user- 
specified reachability threshold x. During state exploration, the reachability-value func- 


STAMINA: A New Infinite-State CTMC Model-Checker 543 


tion, & : X — RY, estimates the probability of reaching a state on-the-fly, and is com- 
pared against x to determine whether the state search should terminate. Only states with 
a higher reachability-value than the reachability threshold are explored further. 
Figure 2 illustrates the standard breadth first search (BFS) state exploration for 
reachability threshold x = 0.25. It starts from the initial state whose reachability-value 
i.e., Å(Xo), is initialized to 1.0 as shown in Fig. 2a. In the first step, two new states 
xı and x4 are generated and their reachability-values are 0.8 and 0.2, respectively, 
as shown in Fig. 2b. The reachability-value in xo is distributed to its successor states, 
based on the probability of outgoing transitions from xg to its successor state. For the 
next step, only state x; is scheduled for exploration because &(x1) > x. Note that the 
transition from x4 to Xọ is executed because xq is already in the explored set. Expand- 
ing x leads to two new states, namely x2 and x5 as shown in Fig. 2c, from which only 
Xs is scheduled for further exploration. This leads to the generation of xg and xg shown 
in Fig. 2d. State exploration terminates after Fig. 2e since both newly generated states 
have reachability-values less than 0.25. States x2, x4, X and Xg are marked as termi- 
nal states. During state exploration, the reachability-value update is performed every 
time a new incoming path is added to a state because a new incoming path can add 
its contribution to the state, potentially bringing the reachability-value above x, which 
in turn changes a terminal state to be non-terminal. When the truncated CTMC model 
C |» is analyzed, it introduces some error in the probability value of the property under 
verification, because of leakage the probability (i.e., cumulative path probabilities of 
reaching states not included in the explored state space) during the CTMC analysis. To 


0.2 (| 
=) Start ae Start GG Gy 
(a) (b) (c) 


Fig. 2. State space approximation. 


544 T . Neupane et al. 


account for probability loss, an abstract absorbing state Xabs is created as the sole suc- 
cessor state for all terminal states on each truncated path. Figure 2e shows the addition 
of the absorbing state. 


2.2 Property Based State Space Exploration 


This paper introduces a property-guided state expansion method, in order to efficiently 
obtain a tightened probability bound. Since all non-nested CSL path formulas ¢ (except 
those containing the “next” operator) derive from the “until” formula,  U! W, con- 
struction of the set of terminal states for further expansion boils down to eliminating 
states that are known to satisfy or dissatisfy @ U W. Given a state graph, a path starting 
from the initial state can never satisfy ® U W, if it includes a state satisfying =P ^A =W. 
Also, if a path includes a state satisfying W, satisfiability of P U W can be determined 
without further expanding this path beyond the first W-state. Our property-guided state 
space expansion method identifies the path prefixes, from which satisfiability of 6 U W 
can be determined, and shortens them by making the last state of each prefix absorbing 
based on the satisfiability of (-& V Y). Only the non-absorbing states whose path prob- 
ability is greater than the state probability estimate threshold » are expanded further. 
For detailed algorithms of STAMINA, readers are encouraged to read [21]. 


3 Results 


This section presents results on the following case studies to illustrate the accuracy and 
efficiency of STAMINA: a genetic toggle switch [20,22]; the following examples from 
the PRISM benchmark suite [15]: grid world robot, cyclic server polling system, and 
tandem queuing network; and the Jackson queuing network from INFAMY case stud- 
ies [1]. All case studies are evaluated on STAMINA and INFAMY, except the genetic 
toggle switch '. Experiments are performed on a 3.2 GHz AMD Debian Linux PC with 
six cores and 64 GB of RAM. For all experiments, the maximal number of iterations N 
is set to 10, and the reduction factor «y is set to 1000. All experiments terminate due 
to (Pmax — Pmin) < € where « = 1073, before they reach N. STAMINA is freely 
available at: https://github.com/formal-verification-research/stamina. 

We compare the runtime, state size, and verification results between STAMINA 
and INFAMY using the same precision e = 1073. For all tables in this section, col- 
umn x reports the probability estimate threshold used to terminate state generation in 
STAMINA. The state space size is listed in column |G|(), where K indicates one 
thousand states. Column T'(C’/A) reports the state space construction (C) and analy- 
sis (A) time in seconds. For STAMINA, the total construction and analysis time is the 
cumulation of runtime for all x values for a model configuration. Columns Pmin and 
Pmaz list the lower and upper probability bounds for the property under verification, 
and column P lists the single probability value (within the precision e€) reported by 
INFAMY. We select the best runtime reported by three configurations of INFAMY. The 
improvement in state size (column |G |(X )) and runtime (column T'(%)) are represented 


' INFAMY generates arithmetic errors on the genetic toggle switch model. 
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by the ratio of state count generated by INFAMY to that of STAMINA (higher is better) 
and percentage improvement in runtime (higher is better), respectively. 


Genetic Toggle Switch. The genetic toggle switch circuit model has two inputs, aTc 
and IPTG. It can be set to the OFF state by supplying it with aTc and can be set to 
the ON state by supplying it with IPTG [20]. Two important properties for a toggle 
switch circuit are the response time and the failure rate. The first experiments set IPTG 
to 100 to measure the toggle switch’s response time. It should be noted that the input 
value of 100 molecules of IPTG is chosen to ensure that the circuit switches to the 
ON state. The later experiments initialize IPTG to 0 to compute the failure rate, i.e., 
the probability that the circuit changes state erroneously within a cell cycle of 2, 100s 
(an approximation of the cell cycle in E. coli [24]). Initially, LacI is set to 60 and 
TetR is set to 0 for both experiments. The CSL property used for both experiments, 
P_» [true US?! (TetR > 40 A LacI < 20), describes the probability of the 
circuit switching to the ON state within a cell cycle of 2, 100 s. The ON state is defined 
as LacI below 20 and TetR above 40 molecules. 


Table 1. Verification results for genetic toggle switch. 


IPTG | STAMINA 
Z IG| T(C/A) | Pmin Prax Remark 

100 107° | 1, 127 | 0.15/0.67 | 0.000000 | 0.999671 | Property guided 
10~® | 4, 461 | 0.43/2.84 | 0.966947 | 0.992908 
107° | 7, 163 | 0.43/5.25 | 0.991738 0.991797 
100 1076 | 5,171 | 0.17/1.90 | 0.977942 | 0.992850 | Property agnostic 
107° | 8, 908 | 0.18/3.74 | 0.991739 0.991797 
0 10~*|} 182 | 0.05/0.07 | 0.000000 | 0.697500 | Property guided 
10~® | 2, 438 | 0.16/1.08 | 0.008814 0.060424 
107° | 4, 284 | 0.09/2.12 | 0.013097 0.013609 
0 10~® | 2, 446 | 0.16/1.05 | 0.009169 | 0.060420 | Property agnostic 
107° | 4, 820 | 0.13/2.13 | 0.013097 0.013609 


The property-agnostic state space is generated with the probability estimate thresh- 
old x = 1073. Table 1 shows large probability bounds: [0, 0.999671] for IPTG = 100 
and [0, 0.6975] for IPTG = 0. It is obvious that they are significantly inaccurate w.r.t. 
the precision € of 1073. The 2 is then reduced to 107°% and state generation switches 
to the property-guided state expansion mode, where the CSL property is used to guide 
state exploration, based on the previous state graph. Each state expansion step reduces 
the x value by a factor of kp = 1000. To measure the effectiveness of the property- 
guided state expansion approach, we compare state graphs generated with and without 
the property-guided state expansion, as indicated by the “property agnostic” and “prop- 
erty guided” rows in the table. Property-guided state expansion reduces the size of the 
state space without losing the analysis precision for the same value of x. Specifically, 
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the state expansion approach reduces the state space by almost 20% for the response 
rate experiment. 


Robot World. This case study considers a robot moving in an n-by-n grid and a janitor 
moving in a larger grid Kn-by-Kn, where the constant K is used to significantly scale 
up the state space. The robot starts from the bottom left corner to reach the top right 
corner. The janitor moves around randomly. Either the robot or janitor can occupy one 
grid location at any given time. The robot also randomly communicates with the base 
station. The property of interest is the probability that the robot reaches the top right 
corner within 100 time units while periodically communicating with the base station, 
encoded as P=» | (Pso.5 | true US" communicate ]) US! goal ]. 

Table 2 provides a comparison of results for K = 1024,64 and n = 64, 32. For 
smaller grid size i.e, 32-by-32, the robot can reach the goal with a high probability of 
97.56%. Where as for a larger value of n = 64 and K = 64, the robot is not able to 
reach the goal with considerable probability. STAMINA generates precise results that 
are similar to INFAMY, while exploring less than half of states with shorter runtime. 


Table 2. Comparison between STAMINA and INFAMY. 


Model Params |STAMINA INFAMY Improvement 
IG OJT (C/A) Prin [Pmae IOT C/A] IOTA) 
Robot (n/K) [32/64 |696 41/279 0.975 0.975 |1,591 |492/18 (0.975 2.3 |37.3 
32/1024|696 41/258 0.975 0.975 |1,591 |501/18 0.975 2.3 |42.4 
64/64 |2,273 |135/669 1.46e—4|1.68e—4|5, 088 |1, 625/53|1.5e—4| 2.2 [52.1 
64/1024|2, 273 |132/621 1.46e—4|1.68e—4|5, 088 |1, 625/53|1.5e—4| 2.2 |55.2 
Jackson (N /à)|4/5 201 22/51 0.865 0.865 |635 109/5 0.865 3.2 36.1 
5/5 2,539 990/996 0.819 |0.819 |7,029 |1668/108)/0.819 2.8 |—11.8 


Polling (N) |12 19 3/21 (1.0 1.0 74 1/2 1.0 3.9 |—732.2 
16 57 18/70 1.0 1.0 1,573 |5/54 1.0 27.6 |—48.2 
20 113 [30/77 1.0 1.0 31, 457 |151/1347|1.0 (278.4 (92.9 

Tandem (c) |2047 |33 1/41 0.498 [0.498 (2,392 |3/38 0.498 |725 |—1.4 
4095 |66 1/141 0.499 [0.499 |9,216 |11/265 |0.499 |139.6 148.7 


Jackson Queuing Network. A Jackson queuing network consists of N interconnected 
nodes (queues) with infinite queue capacity. Initially, all queues are considered empty. 
Each station is connected to a single server which distributes the arrived jobs to differ- 
ent stations. Customers arrive as a Poisson stream with intensity À for N queues. The 
model is taken from [10, 13]. We compute the probability that, within 10 time units, the 
first queue has more that 3 jobs and the second queue has more than 5 jobs, given by 
P- [true US! (jobs_1 >4 A jobs_2 > 6). 

Table 2 summarizes the results for this model. STAMINA uses roughly equal time 
to construct and analyze the model for N = 5, whereas INFAMY takes significantly 
longer to construct the state space, making it slower in overall runtime. For N = 4, 
STAMINA is faster in generating verification results In both configurations, STAMINA 
only explores approximately one third of the states explored by INFAMY. 
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Cyclic Server Polling System. This case study is based on a cyclic server attending 
N stations. We consider the probability that station one is polled within 10 time units, 
P_» | true US! station1_polled ]. Table 2 summarizes the verification results for 
N = 12,16,20. The probability of station one being polled within 10s is 1.0 for 
all configurations. Similar to previous case studies, STAMINA explores significantly 
smaller state space. The advantage of STAMINA in terms of runtime starts to manifest 
as the size of model (and hence the state space size) grows. 


Tandem Queuing Network. A tandem queuing network is the simplest interconnected 
queuing network of two finite capacity (c) queues with one server each [18]. Customers 
join the first queue and enter the second queue immediately after completing the service. 
This paper considers the probability that the first queue becomes full in 0.25 time units, 
depicted by the CSL property P=» | true US°-?> queuel_full ]. 

As seen in Table 2, there is almost fifty percent probability that the first queue is full 
in 0.25 s irrespective of the queue capacity. As in the polling server, STAMINA explores 
significantly smaller state space. The runtime is similar for model with smaller queue 
capacity (c = 2047). But the runtime improves as the queue capacity is increased. 


4 Conclusions 


This paper presents an infinite-state stochastic model checker, STAMINA, that uses 
path probability estimates to generate states with high probability and truncate unlikely 
states based on a specified threshold. Initial state construction is property agnostic, and 
the state space is used for stochastic model checking of a given CSL property. The 
calculated probability forms a lower and upper bound on the probability for the CSL 
property, which is guaranteed to include the actual probability. Next, if finer precision of 
the probability bound is required, it uses a property-guided state expansion technique to 
explore states to tighten the reported probability range incrementally. Implementation 
of STAMINA is built on top of the PRISM model checker with tight integration to 
its API. Performance and accuracy evaluation is performed on case studies taken from 
various application domains, and shows significant improvement over the state-of-art 
infinite-state stochastic model checker INFAMY. For future work, we plan to investigate 
methods to determine the reduction factor on-the-fly based on the probability bound. 
Another direction is to investigate heuristics to further improve the property-guided 
state expansion, as well as, techniques to dynamically remove unlikely states. 
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Abstract. We develop a compositional, algebraic theory of skipping 
refinement, as well as local proof methods to effectively analyze the cor- 
rectness of optimized reactive systems. A verification methodology based 
on refinement involves showing that any infinite behavior of an optimized 
low-level implementation is a behavior of the high-level abstract speci- 
fication. Skipping refinement is a recently introduced notion to reason 
about the correctness of optimized implementations that run faster than 
their specifications, i.e., a step in the implementation can skip multiple 
steps of the specification. For the class of systems that exhibit bounded 
skipping, existing proof methods have been shown to be amenable to 
mechanized verification using theorem provers and model-checkers. How- 
ever, reasoning about the correctness of reactive systems that exhibit 
unbounded skipping using these proof methods requires reachability 
analysis, significantly increasing the verification effort. In this paper, we 
develop two new sound and complete proof methods for skipping refine- 
ment. Even in presence of unbounded skipping, these proof methods 
require only local reasoning and, therefore, are amenable to mechanized 
verification. We also show that skipping refinement is compositional, so it 
can be used in a stepwise refinement methodology. Finally, we illustrate 
the utility of the theory of skipping refinement by proving the correctness 
of an optimized event processing system. 


1 Introduction 


Reasoning about the correctness of a reactive system using refinement involves 
showing that any (infinite) observable behavior of a low-level, optimized imple- 
mentation is a behavior allowed by the simple, high-level abstract specification. 
Several notions of refinement like trace containment, (bi)simulation refinement, 
stuttering (bi)simulation refinement, and skipping refinement [4,10,14,20, 22] 
have been proposed in the literature to directly account for the difference in the 
abstraction levels between a specification and an implementation. Two attributes 
of crucial importance that enable us to effectively verify complex reactive sys- 
tems using refinement are: (1) Compositionality: this allows us to decompose a 
monolithic proof establishing that a low-level concrete implementation refines 
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a high-level abstract specification into a sequence of simpler refinement proofs, 
where each of the intermediate refinement proof can be performed independently 
using verification tools best suited for it; (2) Effective proof methods: analyzing 
the correctness of a reactive system requires global reasoning about its infinite 
behaviors, a task that is often difficult for verification tools. Hence it is crucial 
that the refinement-based methodology also admits effective proof methods that 
are amenable for mechanized reasoning. 

It is known that the (bi)simulation refinement and stuttering (bi)simulation 
refinement are compositional and support the stepwise refinement methodol- 
ogy [20,24]. Moreover, the proof methods associated with them are local, i.e., 
they only require reasoning about states and their successors. Hence, they are 
amenable to mechanized reasoning. However, to the best of our knowledge, it 
is not known if skipping refinement is compositional. Skipping refinement is a 
recently introduced notion of refinement for verifying the correctness of opti- 
mized implementations that can “execute faster” than their simple high-level 
specifications, i.e., a step in the implementation can skip multiple steps in the 
specification. Examples of such systems include superscalar processors, concur- 
rent and parallel systems and optimizing compilers. Two proof methods, reduced 
well-founded skipping simulation and well-founded skipping simulation have been 
introduced to reason about skipping refinement for the class of systems that 
exhibit bounded skipping [10]. These proof methods were used to verify the cor- 
rectness of several systems that otherwise were difficult to automatically verify 
using current model-checkers and automated theorem provers. However, when 
skipping is unbounded, the proof methods in [10] require reachability analy- 
sis, and therefore are not amenable to automated reasoning. To motivate the 
need for alternative proof methods for effective reasoning, we consider the event 
processing system (EPS), discussed in [10]. 


1.1 Motivating Example 


An abstract high-level specification, AEPS, of an event processing system is 
defined as follows. Let E be a set of events and V be a set of state variables. 
A state of AEPS is a triple (t, Sch, St), where t is a natural number denoting 
the current time; Sch is a set of pairs (e,t-), where e € E is an event scheduled 
to be executed at time te > t; St is an assignment to state variables in V. The 
transition relation for the AEPS system is defined as follows. If at time ¢ there is 

o (e,t) € Sch, i.e., there is no event scheduled to be executed at time t, then t 
is incremented by 1. Otherwise, we (nondeterministically) choose and execute an 
event of the form (e,t) € Sch. The execution of an event may result in modifying 
St and also removing and adding a finite number of new pairs (e’,t’) to Sch. 
We require that t’ > t. Finally, execution involves removing the executed event 
(e, t) from Sch. Now consider, tEPS, an optimized implementation of AEPS. As 
before, a state is a triple (t, Sch, St). However, unlike the abstract system which 
just increments time by 1 when there are no events scheduled at the current 
time, the optimized system finds the earliest time in future an event is scheduled 
to execute. The transition relation of tEPS is defined as follows. An event (e, te) 
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with the minimum time is selected, t is updated to te and the event e is executed, 
as in the AEPS. Consider an execution of AEPS and tEPS in Fig. 1. (We only 
show the prefix of executions). Suppose at t = 0, Sch be {(e1,0)}. The execution 
of event e; add a new pair (e2,k) to Sch, where k is a positive integer. AEPS 
at t = 0, executes the event e1, adds a new pair (e2,k) to Sch, and updates t 
to 1. Since no events are scheduled to execute before t = k, the AEPS system 
repeatedly increments t by 1 until t = k. At t = k, it executes the event e2. At 
time t = 0, tEPS executes e1. The next event is scheduled to execute at time 
t = k; hence it updates in one step t to k. Next, in one step it executes the event 
e2. Note that tEPS runs faster than AEPS by skipping over abstract states when 
no event is scheduled for execution at the current time. If k > 1, the step from s2 
to s3 in tEPS neither corresponds to stuttering nor to a single step of the AEPS. 
Therefore notions of refinement based on stuttering simulation and bisimulation 
cannot be used to show that tEPS refines AEPS. 


tEPS(c) AEPS(5) 
(s1) (0, {(e1,0)}, {v1 = 1, v2 = 1}) ~~- (0, {(€1, 0) }, {v1 = 1, v2 = 1}) (w1) 


Stes sass 
t 
+ 


(s2) (0, {(e2, k),...}, {v1 = 2, v2 = 1}) ~~~ (0, {(e2, k)}, {un = 2,02 = 1}) (wa) 


(1, {(€2, k),...}, {or = 2, v2 = 1}) (wa) 


(s3) (k, {. I = 2,02 = 2}) ~~ (k, È.. J, (1 = 2, v2 = 2}) (ws) 


i 
+ + 


Fig. 1. Event simulation system 


It was argued in [10] that skipping refinement is an appropriate notion of 
correctness that directly accounts for the skipping behavior exhibited by tEPS. 
Though, tEPS was used to motivate the need for a new notion of refinement, 
the proof methods proposed in [10] are not effective to prove the correctness 
of tEPS. This is because, execution of an event in tEPS may add new events 
that are scheduled to execute at an arbitrary time in future, t.e., in general k 
in the above example execution is unbounded. Hence, the proof methods in [10] 
would require unbounded reachability analysis which often is problematic for 
automated verification tools. Even in the particular case when one can a priori 
determine an upper bound on k and unroll the transition relation, the proof 
methods in [10] are viable for mechanical reasoning only if the upper bound k is 
relatively small. 

In this paper, we develop local proof methods to effectively analyze the cor- 
rectness of optimized reactive systems using skipping refinement. These proof 
methods reduce global reasoning about infinite computations to local reasoning 
about states and their successor and are applicable even if the optimized imple- 
mentation exhibits unbounded skipping. Moreover, we show that the proposed 
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proof methods are complete, i.e., if a system My, is a skipping refinement of 
Mə under a suitable refinement map, then we can always locally reason about 
them. We also develop an algebraic theory of skipping refinement. In particular, 
we show that skipping simulation is closed under relational composition. Thus, 
skipping refinement aligns with the stepwise refinement methodology. Finally, 
we illustrate the benefits of the theory of skipping refinement and the associ- 
ated proof methods by verifying the correctness of optimized event processing 
systems in ACL2s [3]. 


2 Preliminaries 


A transition system model of a reactive system captures the concept of a state, 
atomic transitions that modify state during the course of a computation, and 
what is observable in a state. Any system with a well defined operational seman- 
tics can be mapped to a labeled transition system. 


Definition 1 Labeled Transition System. A labeled transition system (TS) 
is a structure (S,—,L), where S is a non-empty (possibly infinite) set of states, 
—C Sx S, is a left-total transition relation (every state has a successor), and 
L is a labeling function whose domain is S. 


Notation: We first describe the notational conventions used in the paper. Func- 
tion application is sometimes denoted by an infix dot “.” and is left-associative. 
The composition of relation R with itself i times (for 0 < i < w) is denoted Ri 
(w =N and is the first infinite ordinal). Given a relation R and 1 < k < w, R<* 
denotes Uj, <;-, R’ and R=" denotes U „s;>ą R’. Instead of RS® we often write 
the more common Rt. w denotes the disjoint union operator. Quantified expres- 
sions are written as (Qx: r: t), where Q is the quantifier (e.g., 3, Y, min, U), x is 
a bound variable, r is an expression that denotes the range of variable x (true, 
if omitted), and t is a term. 

Let M = (S, —, L) bea transition system. An M-path is a sequence of states 
such that for adjacent states, s and u, s —> u. The jt” state in an M-path ø is 
denoted by o.j. An M-path ø starting at state s is a fullpath, denoted by fp.o.s, 
if it is infinite. An M-segment, (v1,...,vx%), where k > 1 is a finite M-path and 
is also denoted by v. The length of an M-segment V is denoted by |V|. Let 
INC be the set of strictly increasing sequences of natural numbers starting at 
0. The it partition of a fullpath o with respect to m € INC, denoted by "øt, is 
given by an M-segment (o(7.1),...,0(a(i +1) — 1)). 


3 Theory of Skipping Refinement 


In this section we first briefly recall the notion of skipping simulation as described 
in [10]. We then study the algebraic properties of skipping simulation and show 
that a theory of refinement based on it is compositional and therefore can be 
used in a stepwise refinement based verification methodology. 
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The definition of skipping simulation is based on the notion of matching. 
Informally, a fullpath o matches a fullpath 6 under the relation B iff the fullpaths 
can be partitioned in to non-empty, finite segments such that all elements in a 
segment of ø are related to the first element in the corresponding segment of ô. 


Definition 2 smatch [10]. Let M = (S,—, L) be a transition system, o,ô be 
fullpaths in M. For 7,€ € INC and binary relation B C S x S, we define 


scorr(B,o,n,6, £) = (Vi € w :: (Vs € a" :: sBS(E.i))) and 
smatch(B,o,6) = (Ar, € INC :: scorr(B,o,7,6,§)). 


Figure 1 illustrates the notion of matching using our running example: ø is 
the fullpath of the concrete system and 6 is a fullpath of the absract system. 
(The figure only shows the prefix of the fullpaths). The other parameter for 
matching is the relation B, which is just the identity function. In order to show 
that smatch(B,o,6) holds, we have to find 7,€ € INC satisfying the definition. 
In Fig. 1, we separate the partitions induced by our choice for 7, € using —— and 
connect elements related by B with ~~. Since all elements of a ø partition are 
related to the first element of the corresponding 6 partition, scorr(B,o,7,6,§) 
holds, therefore, smatch(B,o, 6) holds. 

Using the notion of matching, skipping simulation is defined as follows. Notice 
that skipping simulation is defined using a single transition system; it is easy 
to lift the notion defined on a single transition system to one that relates two 
transition systems by taking the disjoint union of the transition systems. 


Definition 3 Skipping Simulation (SKS). B C S x S is a skipping simula- 
tion ona TS M = (S,—>, L) iff for all s,w such that sBw, both of the following 
hold. 


(SKS1) L.s = Lew 
(SKS2) (Vo: fp.o.s: (Ad: fp.6.w: smatch(B,o,6))) 


Theorem 1. Let M be a TS. If B is a stuttering simulation (STS) on M then 
B is an SKS on M. 


Proof: Follows directly from the definitions of SKS and STS [18]. 


3.1 Algebraic Properties 


We now study the algebraic properties of SKS. We show that it is closed under 
arbitrary union. We also show that SKS is closed under relational composition. 
The later property is particularly useful since it enables us to use stepwise refine- 
ment and to modularly analyze the correctness of complex systems. 


Lemma 1. Let M be a TS and C be a set of SKS’s on M. Then G = (UB: 
B €C: B) is an SKS on M. 


Corollary 1. For any TS M, there is a greatest SKS on M. 
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Lemma 2. SKS are not closed under negation and intersection. 


The following lemma shows that skipping simulation is closed under relational 
composition. 


Lemma 3. Let M be a TS. If P and Q are SKS’s on M, then R = P;Q is an 
SKS on M. 


Proof: To show that R is an SKS on M = (S,—, L}, we show that for any 
s,w E€ S such that sRw, SKS1 and SKS2 hold. Let s,w € S and sRw. From the 
definition of R, there exists x € S such that sPx and rQw. Since P and Q are 
SKS’s on M, L.s = L.x = L.w, hence, SKS1 holds for R. 

To prove that SKS2 holds for R, consider a fullpath o starting at s. Since 
P and Q are SKSs on M, there is a fullpath 7 in M starting at x, a fullpath 
ô in M starting at w and a,3,0,y € INC such that scorr(P,o,a,7, 3) and 
scorr(Q,7, 4, 6,7) hold. We use the fullpath 6 as a witness and define 7, € € INC 
such that scorr(R,o,7,6,&) holds. 

We define a function, r, that given i, corresponding to the index of a partition 
of 7 under ĝ, returns the index of the partition of 7 under 0 in which the first 
element of 7’s it” partition under ( resides. r.i = j iff 0.7 < 3.1 < 0(j +1). Note 
that r is indeed a function, as every element of 7 resides in exactly one partition 
of 0. Also, since there is a correspondence between the partitions of a and £, 
(by scorr(P,o,a,7,()), we can apply r to indices of partitions of ø under a to 
find where the first element of the corresponding p partition resides. Note that 
r is non-decreasing: a < b > r.a < r.b. 

We define ma € INC, a strictly increasing sequence that will allow us to merge 
adjacent partitions in @ as needed to define the strictly increasing sequence 7 on 
g used to prove SKS2. Partitions in m will consist of one or more a partitions. 
Given 7, corresponding to the index of a partition of o under 7, the function ma 
returns the index of the corresponding partition of o under a. 


ma(0) =0 


tali) = minjewst. |{k:0<k<jArkAr(k—1)}| =i 


Note that ma is an increasing function, i.e., a < b > ma(a) < ma(b). We now 
define 7 as follows. 


T.i = a(ma.t) 
There is an important relationship between r and ma 
r(mai) =-++=r(ma(i+1)-1) 


That is, for all œ partitions that are in the same a partition, the initial states of 
the corresponding (@ partitions are in the same @ partition. 
We define € as follows: €.1 = y(r(ma.1)). 
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We are now ready to prove SKS2. Let s € "ot. We show that sRd(E.i). By 
the definition of 7, we have 


sE T Ca ee E ae 
Hence, 
sPr(B(ma.i)) V +++ V sPr(3(ma(i + 1) — 1) 
Note that by the definition of r (apply r to 7.1): 
6(r(ma.i)) < (rai) < O(r(ma.i) +1) 
Hence, 
T(B(ma.1))Q5(y(r(ma.2))) V: V r(B(mali + 1) — 1))Qd(y(r(mali + 1) — 1))) 


By the definition of € and the relationship between r and ma described above, 
we simplify the above formula as follows. 


T(B(ma.1))Q0(€.i) V- -vV T(B(ma(é + 1) — 1))Q6(E.z) 
Therefore, by the definition of R, we have that sRd(€.i) holds. 


Theorem 2. The reflexive transitive closure of an SKS is an SKS. 
Theorem 3. Given a TS M, the greatest SKS on M is a preorder. 


Proof. Let G be the greatest SKS on M. From Theorem 2, G* is an SKS. Hence 
G* C G. Furthermore, since G C G*, we have that G = G", i.e., G is reflexive 
and transitive. 


3.2 Skipping Refinement 


We now recall the notion of skipping refinement [10]. We use skipping simula- 
tion, a notion defined in terms of a single transition system, to define skipping 
refinement, a notion that relates two transition systems: an abstract transition 
system and a concrete transition system. Informally, if a concrete system is a 
skipping refinement of an abstract system, then its observable behaviors are also 
behaviors of the abstract system, modulo skipping (which includes stuttering). 
The notion is parameterized by a refinement map, a function that maps con- 
crete states to their corresponding abstract states. A refinement map along with 
a labeling function determines what is observable at a concrete state. 


Definition 4 Skipping Refinement. Let M, = ys Ta) and Mc = 


(So, Š, Lo) be transition systems and let r : Sc —> Sa be a refinement map. 
We say Mc is a skipping refinement of Ma with respect to r, written 
Moc Sr Ma, if there exists a binary relation B such that all of the follow- 
ing hold. 
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1. (Vs € So :: sBr.s) and 


2. B is an SKS on (So W Sa W AD where L.s = La(s) for s € Sa, and 
L.s = La(r.s) for s € Sc. 


Next, we use the property that skipping simulation is closed under rela- 
tional composition to show that skipping refinement supports modular reasoning 
using a stepwise refinement approach. In order to verify that a low-level complex 
implementation Mc refines a simple high-level abstract specification M 4 one 
proceeds as follows: starting with M 4 define a sequence of intermediate systems 
leading to the final complex implementation Mc. Any two successive systems in 
the sequence differ only in relatively few aspects of their behavior. We then show 
that, at each step in the sequence, the system at the current step is a refinement 
of the previous one. Since at each step, the verification effort is focused only on 
the few differences in behavior between two systems under consideration, proof 
obligations are simpler than the monolithic proof. Note that this methodology 
is orthogonal to (horizontal) modular reasoning that infers the correctness of a 
system from the correctness of its sub-components. 


Theorem 4. Let Mı = (Sis Di; Me = (S3, Do); and M3 = (sS La) 
be TSs, p : Sı > S2 andr: Sy > S3. If Mı Sp M2 and Mz Ír Ms, then 
My, Spr M3. 


Proof: Since Mı Sp, Mz, we have an SKS, say A, such that (Vs € S1 :: sA(p.s)). 
Furthermore, without loss of generality we can assume that A C S1 x S2. Simi- 
larly, since Mz {r M3, we have an SKS, say B, such that (Vs € S2 :: sB(r.s)) 
and B C S2 x $3. Define C = A; B. Then we have that C C Sı x S3 and 
(Vs € Sı :: sCr(p.s)). Also, from Theorem 2, C is an SKS on (S,WS3, 1 y By L), 
where £.s = L3(s) if s € S3 else £.s = L3(r(p.s)). 
Formally, to establish that a complex low-level implementation Mc refines 
a simple high-level abstract specification M 4, one defines intermediate systems 
M,,...Mn, where n > 1 and establishes the following: Mo = Mo Srp Mi Sr, 
<. Sry Mn = Ma. Then from Theorem 4, we have that Mc r Ma, 
where r = ro; r1; ..-jTn—1. We illustrate the utility of this approach in Sect. 5 by 
proving the correctness of an optimized event processing systems. 


Theorem 5. Let M = (S,—,L) be a TS. Let M! = (S',—', L') where S' C S, 


—' CxS’, — is a left-total subset of >+, and L! = L|g. Then M! Sr M, 
where I is the identity function on S". 


Corollary 2. Let Mo = (Sc, Š, Lo) and M4 = (S4, S, La) be TSs, r: 
Sc —> Sa be a refinement map. Let Mo = (So, Ly Lo) where Sa C Sc, L; is 
a left-total subset of Li+ and Lo = Lols}. If Mc Sr Ma then Mo Sr Ma, 


where r’ is r|s,. 


We now illustrate the usefulness of the theory of skipping refinement using 
our running example of event processing systems. Consider MPEPS, that uses 
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a priority queue to find a non-empty set of events (say Æ+) scheduled to execute 
at the current time and executes them. We allow the priority queue in MPEPS 
to be deterministic or nondeterministic. For example, the priority queue may 
deterministically select a single event in F; to execute, or based on considerations 
such as resource utilization it may execute some subset of events in Æ; in a single 
step. When reasoning about the correctness of MPEPS, one thing to notice is 
that there is a difference in the data structures used in the two systems: MPEPS 
uses a priority queue to effectively find the next set of events to execute in the 
scheduler, while AEPS uses a simple abstract set representation for the scheduler. 
Another thing to notice is that MPEPS can “execute faster” than AEPS in 
two ways: it can increment time by more than 1 and it can execute more than 
one event in a single step. The theory of skipping refinement developed in this 
paper enables us to separate out these concerns and apply a stepwise refinement 
approach to effectively analyse MPEPS. 

First, we account for the difference in the data structures between MPEPS 
and AEPS. Towards this we define an intermediate system MEPS that is identi- 
cal to MPEPS except that the scheduler in MEPS is now represented as a set of 
event-time pairs. Under a refinement map, say p, that extracts the set of event- 
time pairs in the priority queue of MPEPS, a step in MPEPS can be matched by 
a step in MEPS. Hence, MPEPS <, MEPS. Next we account for the difference 
between MEPS and AEPS in the number of events the two systems may execute 
in a single step. Towards this, observe that the state space of MEPS and tEPS 
are equal and the transition relation of MEPS is a left-total subset of the transi- 
tive closure of the transition relation of tEPS. Hence, from Theorem 5, we infer 
that MPEPS is a skipping refinement of tEPS using the identity function, say J4, 
as the refinement map, i.e., MEPS <;, tEPS. Next observe that the state spaces 
of tEPS and AEPS are equal and the transition relation of tEPS is a left-total 
subset of the transitive closure of the transition relation of AEPS. Hence, from 
Theorem 5, tEPS is a skipping refinement of AEPS using the identity function, 
say Ig, as the refinement map, i.e., tEPS <;, AEPS. Finally, from the transitiv- 
ity of skipping refinement (Theorem 4), we conclude that MPEPS p AEPS, 
where p! = p; h; I. 


4 Mechanised Reasoning 


To prove that a transition system Mc is a skipping refinement of a transition 
system M 4 using Definition 3, requires us to show that for any fullpath from Mc 
we can find a matching fullpath from M 4. However, reasoning about existence 
of infinite sequences can be problematic using automated tools. In this section, 
we develop sound and complete local proof methods that are applicable even if a 
system exhibits unbounded skipping. We first briefly present the proof methods, 
reduced well-founded skipping and well-founded skipping simulation, developed 
in [10]. 


Definition 5 Reduced Well-founded Skipping [10]. B C SxS is a reduced 
well-founded skipping relation on TS M =(S,—,L) iff: 
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(RWFSK1) Ys,w € S: sBw: L.s = L.w) 
(RWFSK2) There exists a function, rankt : S x S — W, such that (W, <) is 
well-founded and 


(Vsu,weES:sruAsBu: 
(a) (uBw A rankt(u,w) < rankt(s,w)) V 
(b) (Qu: w >* v : uBv)) 


Definition 6 Well-founded Skipping [10]. B C Sx S is a well-founded skip- 
ping relation on TS M = (S, —>, L) iff: 


(WFSK1) WYs,w € S : sBw : L.s = L.w) 
(WFSK2) There exist functions, rankt : S x S > W, rankl : S x Sx S >w, 
such that (W, <) is well-founded and 


(Vs,u,w E€ S : s > u A sBw : 
(a) (Av: w > v : uBv) v 

(b) (uBw A rankt(u, w) < rankt(s,w)) V 

(c) (Av: w —> v : sBv A rankl(v, s,u) < rankl(w,s,u)) V 
( 


(d) (Au: w 32? v : uBv)) 


Theorem 6 [10]. Let M = (S,—, L) be a TS and BC S x S. The following 


statements are equivalent 


(i) B is a SKS on M; 
(i) B is a WFSK on M; 
(iü) B is a RWFSK on M. 


Recall the event processing systems AEPS and tEPS described in Sect. 1.1. 
When no events are scheduled to execute at a given time, say t, tEPS increments 
time t to the earliest time in future, say k > t, at which an event is scheduled 
for execution. Execution of an event can add an event that is scheduled to be 
executed at an arbitrary time in future. Therefore, we cannot apriori determine 
an upper-bound on k. Using any of the above two proof-methods to reason about 
skipping refinement would require unbounded reachability analysis (conditions 
RWFSK2b and WFSK2d), often difficult for automated verification tools. To 
redress the situation, we develop two new proof methods of SKS; both require 
only local reasoning about steps and their successors. 


Definition 7 Reduced Local Well-founded Skipping. B C Sx S is a local 
well-founded skipping relation on TS M =(S,—,L) iff: 


(RLWFSK1) (Ys,w € S : sBw: L.s = L.w) 
(RLWFSK2) There exist functions, rankt : S x S — W, rankls:S x S — w 
such that (W, <) is well founded, and, a binary relation O C SxS 
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such that 


(Vs,u,w E S:sBu A su: 
(a) (uBw A rankt(u,w) < rankt(s,w)) V 
(b) (Av: w > v : uOv)) 
and 
(Va,yES:aOy: 
(c) eBy V 
(d) (Az: y > z : xOz A rankls(z, x) < rankls(y, £)}) 


Observe that to prove that a relation is an RLWFSK on a transition system, it 
is sufficient to reason about single steps of the transition system. Also, note that 
RLWFSK does not differentiate between skipping and stuttering on the right. 
This is based on an earlier observation that skipping subsumes stuttering. We 
used this observation to simplify the definition. However, it can often be useful to 
differentiate between skipping and stuttering. Next we define local well-founded 
skipping simulation (LWFSK), a characterization of skipping simulation that 
separates reasoning about skipping and stuttering on the right (Fig. 2). 


Dio d SO 
(a) (b) (c) (a) 
E ===- y x 


Fig. 2. Local well-founded skipping simulation (orange line indicates the states are 
related by B and blue line indicate the states are related by ©) (Color figure online) 


Definition 8 Local Well-founded Skipping. B C S x S is a local well- 
founded skipping relation on TS M = (S,—,L) iff: 


(LWFSK1) (Vs,we S: sBw: L.s = L.w) 
(LWFSK2) There exist functions, rankt : S x S — W, rankl: Sx Sx S — w, 
and rankls : S x S — w such that (W, <) is well founded, and, a 
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binary relation O C S x S such that 


(Vs,u,weS:sBuA su: 
(a) (Av: w > v : uBv) V 
(b) (uBw ^ rankt(u,w) < rankt(s,w)) V 


v:w—>v:sBv ^ rankl(v,s,u) < rankl(w, s,u))V 


v:w—>v:u0v)) 
and 
(Vz, y € S : xOy: 
(e) By V 
(f) Gz :y—z:xr0z A rankls(z,x) < rankls(y, x))) 


Like RLWFSK, to prove that a relation is a LWFSK, reasoning about single 
steps of the transition system suffices. However, LWFSK2b accounts for stutter- 
ing on the right, and LWFSK2d along with LWFSK2e and LWFSK2f accounts 
for skipping on the right. Also observe that states related by O are not required 
to be labeled identically and may have no observable relationship to the states 
related by B. 


Soundness and Completeness. We next show that RLWFSK and LWFSK 
in fact completely characterize skipping simulation, i.e., RLWFSK and LWFSK 
are sound and complete proof rules. Thus if a concrete system Mc is a skipping 
refinement of M 4, one can always effectively reason about it using RLWFSK 
and LWFSK. 


Theorem 7. Let M = (S,—, L) be a transition system and B C S x S. The 
following statements are equivalent: 


(i) B is an SKS on M; 

(ti) Bis a WFSK on M; 
(iü) B is an RWESK on M; 
(iv) B is an RLWFSK on M; 
(v) B is a LWFSK on M; 


Proof: The equivalence of (i), (ii) and (iii) follows from Theorem 6. That (iv) 
implies (v) follows from the simple observation that RLWFSK2 implies LWFSK2. 
To complete the proof, we prove the following two implications. We prove below 
that (v) implies (ii) in Lemma 4 and that (iii) implies (iv) in Lemma 5. 


Lemma 4. If B isa LWFSK on M, then B is a WFSK on M. 


Proof. Let B be a LWFSK on M. WFSK1 follows directly from LWFSK1. Let 
rankt, rankl, and rankls be functions, and O be a binary relation such that 
LWFSK2 holds. To show that WFSK2 holds, we use the same rankt and rankl 
functions and let s,u,w E€ S and s — u and sBw. LWFSK2a, LWFSK2b and 
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LWFSK2c are equivalent to WFSK2a, WFSK2b and WFSK2c, respectively, so 
we show that if only LWFSK2d holds, then WFSK2d holds. Since LWFSK2d 
holds, there is a successor v of w such that uOv. Since uOv holds, either 
LWFSK2e or LWFSK2f must hold between u and v. However, since LWFSK2a 
does not hold, LWFSK2e cannot hold and LWFSK2f must hold, i.e., there exists 
a successor v’ of v such that uOv’ A rankls(v',u) < rankls(v, u). So, we need 
a path of at least 2 steps from w to satisfy the universally quantified con- 
straint on ©. Let us consider an arbitrary path, ô, such that 6.0 = w, 6.1 = v, 
6.2 = v’, uO6.1, LWFSK2e does not hold between u and 6.2 for i > 1, and 
rankls(0.(i + 1),u) < rankls(6.i,u). Notice that any such path must be finite 
because rankls is well founded. Hence, 6 is a finite path and there exists a k > 2 
such that LWFSK2e holds between u and 6.k. Therefore, WFSK2d holds, i.e., 
there is a state in 6 reachable from w in two or more steps which is related to u 
by B. 


Lemma 5. If B is RWFSK on M, then B is an RLWFSK on M. 


Proof. Let B be an RWFSK on M. RLWFSK1 follows directly from RWFSK1. 
To show that RLWFSK2 holds, we use any rankt function that can be used to 
show that RWFSK2 holds. We define O as follows. 


O = {(u,v) : (Ez :v >" z : uBz)} 


We define rankls(u, v) to be the minimal length of a M-segment that starts at 
v and ends at a state, say z, such that uBz, if such a segment exists and 0 
otherwise. Let s,u,w € S, sBw and s — u. If RWFSK2a holds between s, u, 
and w, then RLWFSK2a also holds. Next, suppose that RWFSK 2a does not hold 
but RWFSK2b holds, i.e., there is an M-segment (w,a,...,v) such that uBv; 
therefore, ua and RLWFSK2b holds. 

To finish the proof, we show that O and rankls satisfy the constraints imposed 
by the second conjunct in RLWFSK2. Let z,y € S, xOy and xz B y. From the 
definition of O, we have that there is an M-segment from y to a state related to x 
by B; let y be such a segment of minimal length. From definition of rankls, we have 
rankls(y, x) = |y |. Observe that y cannot be the last state of y and | y| > 2. This 
is because the last state in Y must be related to x by B, but from the assumption 
we know that x B y. Let y' be a successor of y in y. Clearly, rOy’; therefore, 
rankls(y', x) < |Y |— 1, since the length of a minimal M-segment from y’ to a state 
related to x by B, must be less or equal to | 7| — 1. 


5 Case Study (Event Processing System) 


In this section, we analyze the correctness of an optimized event processing 
system (PEPS) that uses a priority queue to find an event scheduled to execute 
at any given time. We show that PEPS refines AEPS, a simple event processing 
system described in Sect. 1. Our goal is to illustrate the benefits of the theory 
of skipping refinement and the associated local proof methods developed in the 
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paper. We use ACL2s [3], an interactive theorem prover, to define the operational 
semantics of the systems and mechanize a proof of its correctness. 


Operational Semantics of PEPS: A state of PEPS system is a triple 
(tm, otevs,mem), where tm is a natural number denoting current time, otevs is a 
set of timed-event pairs denoting the scheduler that is ordered with respect to a 
total order te-< on timed-event pairs, and mem is a collection of variable-integer 
pairs denoting the shared memory. The transition function of PEPS is defined 
as follows: if there are no events in otevs, then PEPS just increments the cur- 
rent time by 1. Otherwise, it picks the first timed-event pair, say (e,t) in otevs, 
executes it and updates the time to t. The execution of an event may result in 
adding new timed-events to the scheduler, removing existing timed-events from 
the scheduler and updating the memory. Finally, the executed timed-event is 
removed from the scheduler. This is a simple, generic model of an event pro- 
cessing system. Notice that the ability to remove events can be used to specify 
systems with preemption [23]: an event scheduled to execute at some future time 
may be canceled (and possibly rescheduled to be executed at a different time in 
future) as a result of the execution of an event that preempts it. Notice that, for 
a given total order, PEPS is a deterministic system. 

The execution of an event is modeled using three constrained functions that 
take as input an event, ev, a time, t, and a memory, mem: step-events-add 
returns the set of new timed-event pairs to add to the scheduler; step-events-rm 
returns the set of timed-event pairs to remove from the scheduler; and 
step-memory returns a memory updated as specified by the event. We place 
minimal constraints on these functions. For example, we only require that 
step-events-add returns a set of event-time pairs of the form (e,t.) where 
te is greater than the current time t. The constrained functions are defined using 
the encapsulate construct in ACL2 and can be instantiated with any executable 
definitions that satisfy these constraints without affecting the proof of correct- 
ness of PEPS. Moreover, note that the particular choice of the total order on 
timed-event pairs is irrelevant to the proof of correctness of PEPS. 


Stepwise Refinement: We show that PEPS refines AEPS using a stepwise 
refinement approach: first we define an intermediate system HPEPS obtained by 
augmenting PEPS with history information and show that PEPS is a simulation 
refinement of HPEPS. Second, we show that HPEPS is a skipping refinement of 
AEPS. Finally, we appeal to Theorems 1 and 4 to infer that PEPS refines AEPS. 
Note that the compositionality of skipping refinement enables us to decompose 
the proof into a sequence of refinement proofs, each of which is simpler. Moreover, 
the history information in HPEPS is helpful in defining the witnessing binary 
relation and the rank function required to prove skipping refinement. 

An HPEPS state is a four-tuple (tm, otevs, mem, h), where tm, otevs, mem are 
respectively the current time, an ordered set of timed events and a collection of 
variable-integer pairs, and h is the history information. The history information h 
consists of a Boolean variable valid, time tm, and an ordered set of timed-event 
pairs otevs and the memory mem. Intuitively, h records the state preceding the 
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current state. The transition function HPEPS is same as the transition function 
of PEPS except that HPEPS also records the history in h. 


PEPS Refines HPEPS: Observe that, modulo the history information, a step 
of PEPS directly corresponds to a step of HPEPS, i.e., PEPS is a bisimula- 
tion refinement of HPEPS under a refinement map that projects a PEPS state 
(tm, otevs, mem) to the HPEPS state (tm, otevs, mem, h) where the valid com- 
ponent of h is set to false. But we only prove that it is a simulation refinement, 
because, from Theorem 1, it suffices to establish that PEPS is a skipping refine- 
ment of HPEPS. The proofs primarily require showing that two sets of ordered 
timed-events that are set equivalent are in fact equal and that adding and remov- 
ing equivalent sets of timed-event from equal schedulers results in equal sched- 
ulers. 


HPEPS Refines AEPS: Next we show that HPEPS is a skipping refine- 
ment of AEPS under the refinement map R, a function that simply projects an 
HPEPS state to an AEPS state. To show that HPEPS is a skipping refinement 
of AEPS under the refinement map R, from Definition 4, we must show as wit- 
ness a binary relation B that satisfies the two conditions. Let B = {(s, R.s) : 
s is an HPEPS state}. To establish that B is an SKS on the disjoint union of 
HPEPS and AEPS, we have a choice of four proof-methods (Sect. 4). Recall that 
execution of an event can add a new event scheduled to be executed at an arbi- 
trary time in the future. As a result, if we were to use WFSK or RWFSK, the proof 
obligations from conditions WFSK2d (Definition 5) and RWFSK2b (Definition 6) 
would require unbounded reachability analysis, something that typically places a 
big burden on verification tools and their users. In contrast, the proof obligations 
to establish RLWFSK are local and only require reasoning about states and their 
successors, which significantly reduces the proof complexity. 

RLWFSK1 holds trivially. To prove that RLWFSK2 holds we define a binary 
relation O and a rank function rankls and show that they satisfy the two univer- 
sally quantified formulas in RLWFSK2. Moreover, since HPEPS does not stutter 
we ignore RLWFSK 2a, and that is why we do not define rankt. Finally, our proof 
obligation is: for all HPEPS s,u and AEPS state w such that s — u and sBw 
holds, there exists a AEPS state v such that w — v and uOv holds. 


Verification Effort: We used the defdata framework in ACL2s, to specify 
the data definitions for the three systems and the definec construct to intro- 
duce function definitions along with their input-contracts (pre-conditions) and 
output-contracts (post-conditions). In addition to admitting a data definition, 
defdata proves several theorems about the functions that are extremely help- 
ful in automatically discharging type-like proof obligations. We also developed a 
library to concisely describe functions using higher-order constructs like map and 
reduce, which made some of the definitions clearer. ACL2s supports first-order 
quantifiers via the defun-sk construct, which essentially amounts to the use 
of Hilbert’s choice operator. We use defun-sk to model the transition relation 
for AEPS (a non-deterministic system) and to specify the proof obligations for 
proving that HPEPS refines AEPS. However, support for automated reasoning 
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about quantifiers is limited in ACL2. Therefore, we use the domain knowledge, 
when possible (e.g., a system is deterministic), to eliminate quantifiers in the 
proof obligations and provide explicit witnesses for existential quantifiers. 

The proof makes essential use of several libraries available in ACL2 for reason- 
ing about lists and sets. In addition, we prove a collection of additional lemmas 
that can be roughly categorized into four categories. First, we have a collection 
of lemmas to prove the input-output contracts of the functions. Second, we have 
a collection of lemmas to show that operations on the schedulers in the three 
systems preserve various invariants, e.g., that any timed-event in the scheduler 
is scheduled to execute at a time greater or equal to the current time. Third, we 
have a collection of lemmas to show that inserting and removing two equivalent 
sets of timed-events from a scheduler results in an equivalent scheduler. And 
fourth, we have a collection of lemmas to show that two schedulers are equiva- 
lent iff they are set equal. The above lemmas are used to establish a relationship 
between priority queues, a data structure used by the implementation system, 
and sets, the corresponding data structure used in the specification system. The 
behavioral difference between the two systems is accounted for by the notion 
of skipping refinement. This separation significantly eases understanding as well 
as mechanical reasoning about the correctness of reactive systems. We have 8 
top-level proof obligations and a few dozen supporting lemmas. The entire proof 
takes about 120s on a machine with 2.2GHz Intel Core i7 with 16GB main 
memory. 


6 Related Work 


Several notions of correctness have been proposed in the literature and their 
properties been widely studied [2,5,11,16,17]. In this paper, we develop a the- 
ory of skipping refinement to effectively prove the correctness of optimized reac- 
tive systems using automated verification tools. These results establish skipping 
refinement on par with notions of refinement based on (bi)simulation [22] and 
stuttering (bi)simulation [20,24], in the sense that skipping refinement is (1) 
compositional and (2) admits local proofs methods. Together the two proper- 
ties have been instrumental in significantly reducing the proof complexity in 
verification of large and complex systems. We developed the theory of skipping 
refinement using a generic model of transition systems and place no restrictions 
on the state space size or the branching factor of the transition system. Any 
system with a well-defined operational semantics can be mapped to a labeled 
transition system. Moreover, the local proof methods are sound and complete, 
i.e., if an implementation is a skipping refinement of the specification, we can 
always use the local proof methods to effectively reason about it. 
Refinement-based methodologies have been successfully used to verify the 
correctness of several realistic hardware and software systems. In [13], several 
complex concurrent programs were verified using a stepwise refinement method- 
ology. In addition, Kragl and Qadeer [13] also develop a compact representation 
to facilitate the description of programs at different levels of abstraction and asso- 
ciated refinement proofs. Several back-end compiler transformations are proved 
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correct in Compcert [15] using simulation refinement. In [25], several compiler 
transformations were verified using stuttering refinement and associated local 
proof methods. Recently, refinement-based methodology has also been applied 
to verify the correctness of practical distributed systems [8] and a general- 
purpose operating system microkernel [12]. The full verification of CertikKOS 
[6,7], an OS kernel, is based on the notion of simulation refinement. Refine- 
ment based approaches have also been extensively used to verify microprocessor 
designs [1,9,19,21, 26]. Skipping refinement was used to verify the correctness of 
optimized memory controllers and a JVM-inspired stack machine [10]. 


7 Conclusion and Future Work 


In this paper, we developed the theory of skipping refinement. Skipping refine- 
ment is designed to reason about the correctness of optimized reactive systems, a 
class of systems where a single transition in a concrete low-level implementation 
may correspond to a sequence of observable steps in the corresponding abstract 
high-level specification. Examples of such systems include optimizing compilers, 
concurrent and parallel systems and superscalar processors. We developed sound 
and complete proof methods that reduce global reasoning about infinite compu- 
tations of such systems to local reasoning about states and their successors. We 
also showed that the skipping simulation is closed under composition and there- 
fore is amenable to modular reasoning using a stepwise refinement approach. We 
experimentally validated our results by analyzing the correctness of an optimized 
event-processing system in ACL2s. For future work, we plan to precisely classify 
temporal logic properties that are preserved by skipping refinement. This would 
enable us to transfer temporal properties from specifications to implementations, 
after establishing refinement. 
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Abstract. We solve in a purely symbolic way the robust controller syn- 
thesis problem in timed automata with Büchi acceptance conditions. The 
goal of the controller is to play according to an accepting lasso of the 
automaton, while resisting to timing perturbations chosen by a com- 
peting environment. The problem was previously shown to be PSPACE- 
complete using regions-based techniques, but we provide a first tool solv- 
ing the problem using zones only, thus more resilient to state-space explo- 
sion problem. The key ingredient is the introduction of branching con- 
straint graphs allowing to decide in polynomial time whether a given 
lasso is robust, and even compute the largest admissible perturbation if 
it is. We also make an original use of constraint graphs in this context 
in order to test the inclusion of timed reachability relations, crucial for 
the termination criterion of our algorithm. Our techniques are illustrated 
using a case study on the regulation of a train network. 


1 Introduction 


Timed automata [1] extend finite-state automata with timing constraints, pro- 
viding an automata-theoretic framework to design, model, verify and synthesise 
real-time systems. However, the semantics of timed automata is a mathemati- 
cal idealisation: it assumes that clocks have infinite precision and instantaneous 
actions. Proving that a timed automaton satisfies a property does not ensure 
that a real implementation of it also does. This robustness issue is a challeng- 
ing problem for embedded systems [12], and alternative semantics have been 
proposed, so as to ensure that the verified (or synthesised) behaviour remains 
correct in presence of small timing perturbations. 

We are interested in a fundamental controller synthesis problem in 
timed automata equipped with a Butchi acceptance condition: it con- 
sists in determining whether there exists an accepting infinite execution. 
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Thus, the role of the controller is to choose transitions and delays. This prob- 
lem has been studied numerously in the exact setting [13-15,17, 19,27, 28]. 
In the context of robustness, this strategy should be tolerant to small pertur- 
bations of the delays. This discards strategies suffering from weaknesses such 
as Zeno behaviours, or even non-Zeno behaviours requiring infinite precision, as 
exhibited in [6]. 

More formally, the semantics we consider is defined as a game that depends 
on some parameter 6 representing an upper bound on the amplitude of the 
perturbation [7]. In this game, the controller plays against an antagonistic envi- 
ronment that can perturb each delay using a value chosen in the interval [—4, ô]. 
The case of a fixed value of 6 has been shown to be decidable in [7], and also for 
a related model in [18]. However, these algorithms are based on regions, and as 
the value of 6 may be very different from the constants appearing in the guards 
of the automaton, do not yield practical algorithms. Moreover, the maximal per- 
turbation is not necessarily known in advance, and could be considered as part 
of the design process. 

The problem we are interested in is qualitative: we want to determine whether 
there exists a positive value of 6 such that the controller wins the game. It has 
been proven in [25] that this problem is in PSPACE (and even PSPACE-complete), 
thus no harder than in the exact setting with no perturbation allowed [1]. How- 
ever, the algorithm heavily relies on regions, and more precisely on an abstraction 
that refines the one of regions, namely folded orbit graphs. Hence, it is not at 
all amenable to implementation. 

Our objective is to provide an efficient symbolic algorithm for solving this 
problem. To this end, we target the use of zones instead of regions, as they 
allow an on-demand partitioning of the state space. Moreover, the algorithm we 
develop explores the reachable state-space in a forward manner. This is known 
to lead to better performances, as witnessed by the successful tool UPPAAL 
TIGA based on forward algorithms for solving controller synthesis problems [5]. 

Our algorithm can be understood as an adaptation to the robustness set- 
ting of the standard algorithm for Biichi acceptance in timed automata [17]. 
This algorithm looks for an accepting lasso using a double depth-first search. A 
major difficulty consists in checking whether a lasso can be robustly iterated, 
i.e. whether there exists ô > 0 such that the controller can follow the cycle for 
an infinite amount of steps while being tolerant to perturbations of amplitude at 
most ô. The key argument of [25] was the notion of aperiodic folded orbit graph 
of a path in the region automaton, thus tightly connected to regions. Lifting this 
notion to zones seems impossible as it makes an important use of the fact that 
valuations in regions are time-abstract bisimilar, which is not the case for zones. 

Our contributions are threefold. First, we provide a polynomial time proce- 
dure to decide, given a lasso, whether it can be robustly iterated. This sym- 
bolic algorithm relies on a computation of the greatest fixpoint of the operator 
describing the set of controllable predecessors of a path. In order to provide 
an argument of termination for this computation, we resort to a new notion of 
branching constraint graphs, extending the approach used in [16,26] and based 
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Fig. 1. A timed automaton 


on constraint graphs (introduced in [8]) to check iterability of a cycle, with- 
out robustness requirements. Second, we show that when considering a lasso, 
not only can we decide robust iterability, but we can even compute the largest 
perturbation under which it is controllable. This problem was not known to 
be decidable before. Finally, we provide a termination criterion for the analy- 
sis of lassos. Focusing on zones is not complete: it can be the case that two 
cycles lead to the same zones, but one is robustly iterable while the other one is 
not. Robust iterability crucially depends on the real-time dynamics of the cycle 
and we prove that it actually only depends on the reachability relation of the 
path. We provide a polynomial-time algorithm for checking inclusion between 
reachability relations of paths in timed automata based on constraint graphs. It 
is worth noticing that all our procedures can be implemented using difference 
bound matrices, a very efficient data structure used for timed systems. These 
developments have been integrated in a tool, and we present a case study of a 
train regulation network illustrating its performances. 

Integrating the robustness question in the verification of real-time systems 
has attracted attention in the community, and the recent works include, for 
instance, robust model checking for timed automata under clock drifts [23], Lip- 
schitz robustness notions for timed systems [11], quantitative robust synthesis 
for timed automata [2]. Stability analysis and synthesis of stabilizing controllers 
in hybrid systems are a closely related topic, see e.g. [20,21]. 


2 Timed Automata: Reachability and Robustness 


Let ¥ = {z1,..., Zn} be a finite set of clock variables. It is extended with a 
virtual clock zo, constantly equal to 0, and we denote by Xo the set ¥ U {zo}. 
An atomic clock constraint on ¥ is a formula z — y <S k, or x— y < k with 
x Æ y € Xo and k € Q. A constraint is non-diagonal if one of the two clocks 
is Zo. We denote by Guards(X) (respectively, Guards,q(X)) the set of (clock) 
constraints (respectively, non-diagonal clock constraints) built as conjunctions 
of atomic clock constraints (respectively, non-diagonal atomic clock constraints). 

A clock valuation v is an element of RS). It is extended to Ro by letting 
v(xo) = 0. For all d € Ryo, we let v + d be the valuation defined by (v + 
d)(x) = v(x) + d for all clocks x € ¥. If Y C X, we also let v[V — 0] be the 
valuation resetting clocks in Y to 0, without modifying values of other clocks. A 
valuation v satisfies an atomic clock constraint « — y œ< k (with x € {<,<}) if 
v(x) — v(y) tk. The satisfaction relation is then extended to clock constraints 
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naturally: the satisfaction of constraint g by a valuation v is denoted by v = g. 
The set of valuations satisfying a constraint g is denoted by [g]. 

A timed automaton is a tuple A = (L, 4, E, L+) with L a finite set of loca- 
tions, lọ € L an initial location, Æ C L x Guardsyq(4) x 2* x L is a finite set of 
edges, and L; is a set of accepting locations. 

An example of timed automaton is depicted in Fig.1, where the reset of a 
clock x is denoted by x := 0. The semantics of the timed automaton A is defined 
as an infinite transition system [A] = (S,s9,—). The set S of states of [A] is 


L x R&o, so = (¢0,0). A transition of [A] is of the form (£, v) ae, (Z, v) with 
e = (£,g,ŅY, Ll) € E and d € R>o such that v +d g and v’ = (v + d)[Y — 0]. 
We call path a possible finite sequence of edges in the timed automaton. The 
reachability relation of a path p, denoted by Reach(p) is the set of pairs (v, v’) 
such that there is a sequence of transitions of [A] starting from (£, v), ending 
in (V, v’) and that follows p in order as the edges of the timed automaton. A 
run of A is an infinite sequence of transitions of [A] starting from so. We are 
interested in Büchi objectives. Therefore, a run is accepting if there exists a final 
location 4&4 € L; that the run visits infinitely often. 

As done classically, we assume that every clock is bounded in A by a con- 
stant M, that is we only consider the previous infinite transition system over 
the subset L x [0,M]* of states. 

We study the robustness problem introduced in [25], that is stated in terms 
of games where a controller fights against an environment. After a prefix of a 
run, the controller will have the capability to choose delays and transitions to 
fire, whereas the environment perturbs the delays chosen by the controller with 
a small parameter 6 > 0. The aim of the controller will be to find a strategy so 
that, no matter how the environment plays, he is ensured to generate an infinite 
run satisfying the Biichi condition. Formally, given a timed automaton A = 
(L, l, E, L+) and 6 > 0, the perturbation game is a two-player turn-based game 
Gs(A) between a controller and an environment. Its state space is partitioned 
into Sc H Sp where Sc = Lx R%o belongs to the controller, and Sg = Lx Ro x 
Rso x E to the environment. The initial state is (49,0) € Sc. From each state 
(£, v) € Sc, there is a transition to (€,v,d,e) E Sp with e = (¢,9,Y,0) E€ E 
whenever d > 6, and v + d +€ | g for all e € [—d,6]. Then, from each state 
(¢,v,d,(€,9,Y,@)) E€ Sz, there is a transition to (’,(v+d+e)|[r — 0]) € Sc 
for all e € [—6,6]. A play of Gs(A) is a finite or infinite path qo yg ayes 
where go = (4,0) and t; is a transition from state q;_1 to q;, for all i > 0. It is 
said to be maximal if it is infinite or can not be extended with any transition. 

A strategy for the controller is a function o¢o, mapping each non-maximal 
play ending in some (£, v) € Sc to a pair (d,e) where d > 0 and e € E such that 
there is a transition from (@,v) to (¢,v,d,e). A strategy for the environment is 
a function dgn mapping each finite play ending in (£,v,d,e) to a state (V, v’) 
related by a transition. A play gives rise to a unique run of [A] by only keep- 
ing states in Vc. For a pair of strategies (ocon, Env), we let play% (acon; Env) 
denote the run associated with the unique maximal play of Gs(A) that follows 
the strategies. Controller’s strategy ocon is winning (with respect to the Biichi 
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objective L+) if for all strategies oem, of the environment, play, (acon, Cen) is 
infinite and visits infinitely often some location of L+. The parametrised robust 
controller synthesis problem asks, given a timed automaton A, whether there 
exists 6 > 0 such that the controller has a winning strategy in G;(A). 


Example 1. The controller has a winning strategy in Gs(.A), with A the automa- 
ton of Fig. 1, for all possible values of ô < 1/2. Indeed, he can follow the cycle 
lo —> b3 — lo by always picking time delay 1/2 so that, when arriving in %3 
(resp. Zo) after the perturbation of the environment, clock x2 (resp. xı) has a 
valuation in [1/2 —6,1/2+]. Therefore, he can play forever following this mem- 
oryless strategy. For ô > 1/2, the environment can enforce reaching ¢3 with a 
value for x at least equal to 1. The guard x2 < 2 of the next transition to £o 
cannot be guaranteed, and therefore the controller cannot win Gs(A). In [25], 
it is shown that the cycle around ¢2 does not provide a winning strategy for 
the controller for any value of 6 > 0 since perturbations accumulate so that the 
controller can only play it a finite number of times in the worst case. 


By [25], the parametrised robust controller synthesis problem is known to be 
PSPACE-complete. Their solution is based on the region automaton of A. We are 
seeking for a more practical solution using zones. A zone Z over ¥ is a convex 
subset of R&q defined as the set of valuations satisfying a clock constraint g, 
i.e. Z = |g]. Zones can be encoded into difference-bound matrices (DBM), that 
are |X| x |Xo|-matrices over (R x {<, <}) U {(co, <)}. We adopt the following 
notation: fora DBM M, we write M = (M, <™), where M is the matrix made of 
the first components, with elements in R U {oo}, while <™ is the matrix of the 
second components, with elements in {<, <}. A DBM M naturally represents 
a zone (which we abusively write M as well), defined as the set of valuations v 
such that, for all z, y € Xo, v(w)—v(y) <M, Ms, (where v(ao) = 0). Coefficients 
of a DBM are thus pairs (<,c). As usual, these can be compared: (<,c) is less 
than (~’,c’) (denoted by (<,c) < (*’,c’)) whenever c < œ or (e = d, < =< 
and <’ = <). Moreover, these coefficients can be added: (<, c) + (<’,c’) is the 
pair (x”,c + c') with <” = < if < = <’=< and <” = < otherwise. 

DBMs were introduced in [4,10] for analyzing timed automata; we refer 
to [3] for details. Standard operations used to explore the state space of 
timed automata have been defined on DBMs: intersection is written M A N, 
Pretimes,(M) is the set of valuations such that a time delay of more than t 
time units leads to the zone M, Unresetr(M) is the set of valuations that end 
in M when the clocks in R are reset. From a robustness perspective, we also 
consider the operator shrinkj—s,s} (M) defined as the set of valuations v such that 
v + [—06,6] C M introduced in [24]. Given a DBM M and a rational number ô, 
all these operations can be effectively computed in time cubic in |. 


3 Reachability Relation of a Path 


Before treating the robustness issues, we start by designing a symbolic (i.e. zone- 
based) approach to describe and compare the reachability relations of paths 
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in timed automata. This will be crucial subsequently to design a termination 
criterion in the state space exploration of our robustness-checking algorithm. 
Solving the inclusion of reachability relations in a symbolic manner has inde- 
pendent interest and can have other applications. 

The reachability relation Reach(p) of a path p, is a subset of REYE where 4” 
are primed versions of the clocks, such that each (v, v’) € Reach(p) iff there is 
a run from valuation v to valuation v’ following p. Unfortunately, reachability 
relations Reach(p) are not zones in general, that is, they cannot be represented 
using only difference constraints. In fact, we shall see shortly that constraints of 
the form x — y + z — u < c also appear, as already observed in [22]. We thus 
cannot rely directly on the traditional difference bound matrices (DBMs) used to 
represent zones. We instead rely on the constraint graphs that were introduced 
in [8], and explored in [16] for the parametric case (the latter work considers 
enlarged constraints, and not shrunk ones as we study here). Our contribution 
is to use these graphs to obtain a syntactic check of inclusion of the according 
reachability relations. 


Constraint Graphs. Rather than considering the values of the clocks in ¥, 
this data structure considers the date X; of the latest reset of the clock 2;, 
and uses a new variable 7 denoting the global timestamp. Note that the clock 
values can be recovered easily since X; = T — 2;. For the extra clock zo, we 
introduce variable Xo equal to the global timestamp 7 (since x 9 must remain 
equal to 0). A constraint graph defining a zone is a weighted graph whose nodes 
are X = {Xo, X1,..., Xn}. Constraints on clocks are represented by weights on 
edges in the graph: a constraint X — Y < c is represented by an edge from X 
to Y weighted by (<,c), with < € {<,<} and c € Q. Weights in the graph 
are thus pairs of the form (<,c). Therefore, we can compute shortest weights 
between two vertices of a weighted graph. A cycle is said to be negative if it has 
weight at most (<,0), i.e. (<,0) or (<,c) with c < 0. 


Encoding Paths. Constraint graphs can also encode tuples of valuations seen 
along a path. To encode a k-step computation, we make k + 1 copies of the 
nodes, that is, X? = {X$, X!,...,X?} for i € {1,...,4 +1}. These copies are 
also called layers. Let us first consider an example on the path p consisting of the 
edge from £; to £2, and the edge from ¢2 to 41, in the timed automaton of Fig. 1. 
The constraint graph G, is depicted in Fig. 3: in our diagrams of constraint 
graphs, the absence of labels on an edge means (<,0), and we depict with an 
edge with arrows on both ends the presence of an edge in both directions. The 
graph has five columns, each containing copies of the variables for that step: 
they represent the valuations before the first edge, after the first time elapse, 
after the first reset, after the second time elapse and after the second reset. In 
general now, each elementary operation can be described by a constraint graph 
with two layers (X;) (before) and (X/) (after). 


a 


- The operation Pretimes; is described by the constraint graph GZ", 
Xi > Xo, Xi > X! for i > 0, and Xo {© X4. Figure3 contains two 


occurrences of GSN we always represent with dashed arrows edges that are 


with edges 
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labelled by (<,c), and plain arrows edges that are labelled with (<,c); the 
absence of an edge means that it is labelled with (<, 00). 
— The operation g N Unresety(-), to test a guard g and reset the clocks in yY, 
is described by the constraint graph Go. with edges Xo + X46 (meaning 
that the time does not elapse), X; © X; for i such that clock x; ¢ VY, and 
Xi <> X for i such that clock x; € Y, and for all clock constraint £z; — xj < c 
appearing in g, an edge from X; to X; labelled by (<,c) (since it encodes 
the fact that (T — Xi) — (T — Xj) = X; — Xi < c). In Fig.3, we have first 


<2, >2, 
Goo", andthen Ga, 


Constraint graphs can be stacked one after the other to obtain the constraint 
graph of an edge e, and then of a path p, that we denote by Gp. In the resulting 
graph, there is one leftmost layer of vertices (Xf); and one rightmost one (XT); 
representing the situation before and after the firing of the path p. Once this 
graph is constructed, the intermediary levels can be eliminated after replacing 
each edge between the nodes of X U X” by the shortest path in the graph. This 
phase is hereafter called normalisation of the constraint graph. The normalised 
version of the constraint graph of Fig. 3 is depicted on its right. 


From Constraint Graphs to Reachability Relations. From a logical point 
of view, the elimination of intermediary layers reflects an elimination of quanti- 
fiers in a formula of the first-order theory of real numbers. At the end, we obtain 
a set of constraints of the form X} — XK < c with k,k’ € {£,r}. These con- 
straints do not reflect uniquely the reachability relation Reach(p), in the sense 
that it is possible that Reach(p1) = Reach(p2) but the normalised versions of 
Gp, and Gp, are different. For example, if we consider the path p? obtained by 
repeating the cycle p between ¢; and #2, the reachability relation does not change 
(Reach(p”) = Reach(p)), but the normalised constraint graph does (G,2 4 Gp): 
all labels (<, 2) of the red dotted edges from the rightmost layer to the leftmost 
layer become (<,4), and the labels (<,—2) of the dashed blue edges become 
(<, —4). 

We solve this issue by jumping back from variables X* to the clock valuations. 
Indeed, in terms of clock valuations v’ and v” before and after the path, the 
constraint X* — X < c (for k,k’ € {l,r}) rewrites as (r* — v¥(z;)) — (r* — 
v¥' (x;)) < c, where T° is the global timestamp before firing p and 7” the one after. 
When k = k’, variables rf and 7” disappear, leaving a constraint of the form 
vt (xj) — v*(a;) < c. When k # k’, we can rewrite the constraint as Të — 7% < 
vF (xi) — vt (aj) + c. We therefore obtain upper and lower bounds on the value 
of T” — Tf, allowing us to eliminate T” — 7° considered as a single variable. We 
therefore obtain in fine a formula mixing constraints of the form 


e v* (xq) — V(x») < p, with k € {4r}, a # b, and we define 7 , = (<, p); 

e vf (£a) — v(x) +1” (z£e)— v" (xa) < p, with a 4 b and c Æ d, and we define 
Ya,b,c,d = (<, p). This constraint can appear in two ways: either from v” (ze) — 
vf (£p) + pi <1 T7 — T! Xo V'(£a) — v" (za) + p2 by eliminating T” — 7', 

or by adding the two constraints of the form v!(r,) — v! (x) <ı pı and 
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V"(%e) — v” (xa) <2 p2. Thus, Ya,b,c,q is obtained as the minimum of the two 
constraints obtained in this manner. In other terms, in the constraint graph, 
this constraint is the minimal weight between the sum of the weights of the 
edges (X7, X!) and (X}, X7), and the sum of the weights of the edges (X}, X!) 
and (X7, X7). For example, in the path in Fig.3, we have 7,1,0,2 = (<,0) 
since the two constraints are (<,0) and (<,oo), whereas 71.221 = (<,0) 
because the two constraints are (<,2) and (<,0). 


Let y(G) be the conjunction of such constraints obtained from a constraint 
graph G once normalised: this is a quantifier-free formula of the additive theory 
of reals. We obtain the following property whose proof mimics the one for proving 
the normalisation of DBMs (and can be derived from the developments of [8]). 


Lemma 1. Let p be a path in a timed automaton. If Gp contains a negative 
cycle, then Reach(p) = Ø. Otherwise, Reach(p) is the set of pairs of valuations 
(v£, v") that satisfy the formula y(G,). 


Checking Inclusion. For a path p, we regroup the pairs (y! ,), (77) and 
(Ya,b,c,4) above in a single vector T°. We extend the comparison relation < to 
these vectors by applying it componentwise. These vectors can be used to check 
equality or inclusion of reachability relations in time O(|X|*): 


Theorem 1. Let p and p' be paths in a timed automaton such that Reach(p) and 
Reach(p’) are non empty. Then Reach(p) C Reach(p') if and only if TP < T”. 


Notice that we do not need to check equivalence or implication of formulas 
y(G,) and y(G,), but simply check syntactically constants appearing in these 
formulas. Moreover, these constants can be stored in usual DBMs on 2 x || 
clocks, allowing for reusability of classical DBM libraries. For the constraint 
graph in Fig.3, we have seen that G,2 # Gp, even if Reach(p”) = Reach(p). 
However, we can check that y(G,2) = (Gp) as expected. 


Computation of Pre and Post. By Lemma 1 and the construction of con- 
straint graphs, one can easily compute Pre,(Z) = {v | dv’ € Z (£, v), (€,v’)) € 
Reach(p)} for a given path p and zone Z (see [8,16]). In fact, consider the 
normalised constraint graph G, on nodes X‘ U X”. To compute Pre,(Z), one 
just needs to add the constraints of Z on X”. This is done by replacing each 


edge XP = X; by X; sea fu X; where Z;,; = (<,p) defines the constraint 
of Z on x; — xi. Then, the normalisation of the graph describes the reachability 
relation along path p ending in zone Z. Furthermore, projecting the constraints 
to X“ yields Pre,(Z): this can be obtained by gathering all constraints on pairs 
of nodes of X*. A reachability relation can thus be seen as a function assigning 
to each zone Z its image by p. One can symmetrically compute the succes- 
sor Post,(Z) = {v | dv € Z ((é,v),(,1”’)) € Reach(p)} by constraining the 
nodes X“ and projecting to X”. 
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4 Robust Iterability of a Lasso 


In this section, we study the perturbation game Gs(A) between the two players 
(controller and environment), as defined in Sect. 2, when the timed automaton A 
is restricted to a fixed lasso pipe, i.e. pı is a path from fọ to some accepting 
location 4, and p2 a cyclic path around 4. This implies that the controller does 
not have the choice of the transitions, but only of the delays. We will consider 
different settings, in which 6 is fixed or not. 


Controllable Predecessors and their Greatest Fixpoints. Consider an 
edge e = (£, g, R, @’). For any set Z C RS, we define the controllable predecessors 
of Z as follows: CPre? (Z) = Pretimes 5(shrinky_5,5)(g N Unresetp(Z))). Intuitively, 
CPre?(Z ) is the set of valuations from which the controller can ensure reaching Z 
in one step, following the edge e, no matter of the perturbations of amplitude at 
most 6 of the environment. In fact, it can delay in shrink;_5,5)(g N Unreset p(Z)) 
with a delay of at least ô, where under any perturbation in [—d, ô], the valuation 
satisfies the guard, and it ends, after reset, in Z. Results of [24] show that this 
operator can be computed in cubic time with respect to the number of clocks. 
We extend this operator to a path p by composition, denoted it by CPre. Note 
that CPre) = Pre, is the usual predecessor operator without perturbation. 

This operator is monotone, hence its greatest fixpoint vX CPre?(X ) is well- 
defined, equal to ();55 CPre? (T): it corresponds to the valuations from which 
the controller can guarantee to loop forever along the path p. By definition of 
the game G;(A) where A is restricted to the lasso p1p2, the controller wins the 
game if and only if 0 € CPre? (vX CPre?, (X)). As a consequence, our problem 
reduces to the computation of this greatest fixpoint. 


Branching Constraint Graphs. We consider first a fixed (rational) value of 
the parameter ô, and are interested in the computation of the greatest fixpoint 
vX CPre®, (X). In [16], constraints graphs were used to provide a termination 
criterion allowing to compute the greatest fixpoint of the classical predecessor 
operator CPre9. We generalize this approach to deal with the operator CPre®, 
and to this end, we need to generalize constraint graphs so as to encode it. 
Unfortunately, the operator shrink;_ 5,5; cannot be encoded in a constraint graph. 
Intuitively, this comes from the fact that a constraint graph represents a relation 
between valuations, while there is no such relation associated with the CPre?, 
operator. Instead, we introduce branching constraint graphs, that will faithfully 
represent the CPre? operator: unlike constraint graphs introduced so far that 
have a left layer and a right layer of variables, a branching constraint graph has 
still a single left layer but several right layers. 

We first define a branching constraint graph Lee associated with the oper- 
ator shrink;_5,5) as follows. Its set of vertices is composed of three copies of the 
{Xo, X1,..., Xn}, denoted by primed, unprimed and doubly primed versions. 
Edges are defined so as to encode the following constraints : Xj = X; and 
Xi’ = X; for every i # 0, and Xj = Xo + ô and Xj = Xo — ô. An instance of 
this graph can be found in several occurrences in Fig. 2. 
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Proposition 1. Let Z be a zone and Gdiin(Z) be the graph obtained from 
Go ink by adding on primed and doubly primed vertices the constraints defining Z 
(as for Pre,(Z) in the end of Sect. 3). Then the constraint on unprimed vertices 
obtained from the shortest paths in G®,,,(Z) is equivalent to shrink _5,5)(Z). 


Proof. Given a zone Z and a real number d, we define Z +d = {v+d|ve Z}. 
One easily observes that shrinks, (Z) = (Z + ô) N (Z — ô). The result follows 
from the observation that taking two distinct copies of vertices, and considering 
shortest paths allows one to encode the intersection. 


Then, for all edges e = (£, g, R, ¢’), we define the branching constraint graph 
GÊ as the graph obtained by stacking (in this order) the branching constraint 
graph G32., Goring and G27”. Note that two copies of the graph Ge. are needed, 
to be connected to the two sets of vertices that are on the right of the graph 
Go ae This definition is extended in the expected way to a finite path p, yielding 
the graph ess In this graph, there is a single set of vertices on the left, and 2!?! 
sets of vertices on the right. As a direct consequence of the previous results on 


the constraint graphs for time elapse, shrinking and guard/reset, one obtains: 


Proposition 2. Let Z be a zone and p be a path. We let G9 (Z) be the graph 
obtained from G? by adding on every set of right vertices the constraints defin- 
ing Z. Then the constraint on the left layer of vertices obtained from the shortest 
paths in G? (Z) is equivalent to CPre?(Z). 


An example of the graph G? (Z) for p = e1e2, edges considered in Fig. 3, is 
depicted in Fig. 2 (on the left). 


1>0 xı <2,{x1} >0 2 22,{x2} 
Ly Gime Geige bə G ime Gie Ly 
Xı o o O <4 > O < > O 
(<,0) (<,0) 
f OS ee= = > Oo +> 0 - - - - - - - > Oo +> 0 
| | k | 
X2 O4 > oO < > O< > Oo o 


Fig. 2. On the left, the branching constraint graph Goes encoding the operator 


CPre? ez, where e1 and e2 refer to edges considered in Fig. 3. Dashed edges have weight 
(<,.), plain edges have weight (<, .). Black edges (resp. orange edges, pink edges, red 
edges, blue edges) are labelled by (.,0) (resp. (.,—8), (.,4), (., 2),(., —2)). On the right, 
a decomposition of a path in a branching constraint graph GÈ. (Color figure online) 


We are now ready to prove the following result, generalisation of [16, 
Lemma 2], that will allow us to compute the greatest fixpoint of the operator 
CPreð: 

pP 
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@1<2,x21:=0 


v2Q>2,22:=0 


Fig. 3. On the left, the constraint graph of the path 41 bo ey. 
On the right, its normalised version: dashed edges have weight (<,.), plain edges have 
weight (<, .), black edges have weight (.,0), red edges have weight (., 2) and blue edges 
have weight (.,—2). 


Proposition 3. Let p be a path and 6 be a non-negative rational number. We 
let N = |Xo|?. If CPregaw4i(T)  CPrevan(T), then vX CPred(X) = 0. 


Proof. Assume CPrevonas (T) ¢ CPre?an (T) and consider the zones CPre wit (T) 
(represented by the DBM Mı) and CPre? v (T) (represented by the DBM Mə). 
We have Mı Ç Mo, as otherwise the fixpoint would have already been reached 
after N steps. By Proposition 2, the zone corresponding to Mj is associated with 
shortest paths between vertices on the left in the graph Govar. In the sequel, 
given a path r in this graph, w(r) denotes its weight. We distinguish two cases: 


Case 1: Mı Ç Mə because of the rational coefficients. Then, there exists an 
entry (x,y) € ¥ such that Mı[z, y] < Mo[z,y]. The value M;[zx,y] is thus 
associated with a shortest path between vertices X and Y in G? n+: We fix a 
shortest path of minimal length, and denote it by r. As the entry is strictly 
smaller than in M2, this shortest path should reach the last copy of the graph 
G. This path can be interpreted as a traversal of the binary tree of depth 
|Xo|? + 1, reaching at least one leaf. We can prove that this entails that there 
exists a pair of clocks (u,v) € XÊ appearing at two levels i < j of this tree, and 
a decomposition r = rır2r3r4rs of the path, such that w(r2) + w(r4) = (<, d) 
with d < 0 (Property ({)). In addition, in this decomposition, rz is included 
in subgraphs of levels k > j, and the pair of paths (r2,r4) is called a return 
path, following the terminology of [16]. This decomposition is depicted in Fig. 2 
(on the right). Intuitively, the property (}) follows from the fact that as r3 is 
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included in subgraphs of levels k > j, and because the final zone (on the right) 
is the zone T which adds no edges, the concatenation r’ = rırgrs is also a valid 
path from X to Y in Go x+, and is shorter than r. We conclude using the fact 
that r has been chosen as a shortest path of minimal weight. 

Property (f) allows us to prove that the greatest fixpoint is empty. Indeed, 
by considering iterations of p, one can repeat the return path associated with 
(r2,r4) and obtain paths from X to Y whose weights diverge towards —oo. 


Case 2: Mı Ç Mə because of the ordering coefficients. We claim that this case 
cannot occur. Indeed, one can show that the constants will not evolve anymore 
after the Nth iteration of the fixpoint: the coefficients can only decrease by 
changing from a non-strict inequality (<,c) to a strict one (<,c). This propaga- 
tion of strict inequalities is performed in at most |¥o|? additional steps, thus we 
have CPre vans (T) = CPre? on (T), yielding a contradiction. 


Compared to the result of [16], the number of iterations needed before con- 
vergence grows from |%|? to 2|Xo|?: this is due to the presence of strict and 
non-strict inequalities, not considered in [16]. With the help of branching con- 
straint graphs, we have thus shown that the greatest fixpoint can be computed 
in finite time: this can then be done directly with computations on zones (and 
not on branching constraint graphs). 


Proposition 4. Given a path p and a rational number 6, the greatest fixpoint 
vX CPre?(X) can be computed in time polynomial in |X| and |p|. As a conse- 
quence, one can decide whether the controller has a strategy along a lasso pıp2 
in Gs(A) in time polynomial in |X| and |p1 po. 


Solving the Robust Controller Synthesis Problem for a Lasso. We have 
shown how to decide whether the controller has a winning strategy for a fixed 
rational value of ô. We now aim at deciding whether there exists a positive value 
of 6 for which the controller wins the game Gs(A) (where A is restricted to a 
lasso p1p2). To this end, we will use a parametrised extension of DBMs, namely 
shrunk DBMs, that were introduced in [24] in order to study the parametrised 
state space of timed automata. Intuitively, our goal is to express shrinkings of 
guards, e.g. sets of states satisfying constraints of the form g = 1+6 < xz < 
2— ^28 < y, where 6 is a parameter to be chosen. Formally, a shrunk DBM 
is a pair (M, P), where M is a DBM, and P is a nonnegative integer matrix 
called a shrinking matrix. This pair represents the set of valuations defined by 
the DBM M — ôP, for any given 6 > 0. Considering the example g, M is the 
guard g obtained by setting 6 = 0, and P is made of the integer multipliers 
of 6. We adopt the following notation: when we write a statement involving a 
shrunk DBM (M, P), we mean that for some ôo > 0, the statement holds for 
M — OP for all 6 € (0,69]. For instance, (M, P) = Pretimey;((V,Q)) means 
that M — ôP = Pretimess5(NV — 6Q) for all small enough 6 > 0. Shrunk DBMs 
are closed under standard operations on zones, and as a consequence, the CPre 
operator can be computed on shrunk DBMs: 
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Lemma 2. ([25]) Let e = (¢,9,R, 0’) be an edge and (M, P) be a shrunk DBM. 
Then, there exists a shrunk DBM (N,Q), that we can compute in polynomial 
time, such that (N,Q) = CPre®((M, P)). 


Proposition 5. Given a path p, one can compute a shrunk DBM (M, P) equal 
to the greatest fixpoint of the operator CPre®. As a consequence, one can solve 
the parametrised robust controller synthesis problem for a given lasso in time 
complexity polynomial in the number of clocks and in the length of the lasso. 


Proof. The bound 2|%o|? identified previously does not depend on the value of ô. 
Hence the algorithm for computing a shrunk DBM representing the greatest fix- 
point proceeds as follows. It computes symbolically, using shrunk DBMs, the 
2|%o|?-th and 2|¥0|? + 1-th iterations of the operator CPre®, from the zone T. 
By monotonicity, the 2|4|? + 1-th iteration is included in the 2|%|?-th. If the 
two shrunk DBMs are equal, then they are also equal to the greatest fixpoint. 
Otherwise, the greatest fixpoint is empty. To decide the robust controller syn- 
thesis problem for a given lasso, one first computes a shrunk DBM representing 
the greatest fixpoint associated with p2 and, if not empty, one computes a new 
shrunk DBM by applying to it the operator CPre®,. Then, one checks whether 
the valuation 0 belongs to the resulting shrunk DBM. 


Computing the Largest Admissible Perturbation. We say that a pertur- 
bation 6 is admissible if the controller wins the game G5( A). The parametrised 
robust controller synthesis problem, solved before just for a lasso, aims at decid- 
ing whether there exists a positive admissible perturbation. A more ambitious 
problem consists in determining the largest admissible perturbation. 

The previous algorithm performs a bounded (2|%|?) number of computations 
of the CPre?, operator. Instead of focusing on arbitrarily small values using shrunk 
DBMs as we did previously, we must perform a computation for all values of 6. To 
do so, we consider an extension of the (shrunk) DBMs in which each entry of the 
matrix (which thus represents a clock constraint) is a piecewise affine function 
of 6. One can observe that all the operations involved in the computation of 
the CPre° operator can be performed symbolically w.r.t. 6 using piecewise affine 


P 
functions. As a consequence, we obtain the following new result: 


Proposition 6. We can compute the largest admissible perturbation of a lasso. 


Proof. Let p1p2 be a lasso. One first computes a symbolic representation, valid 
for all values of ô, of the greatest fixpoint of CPre®,,. To do so, one computes the 
2|%o|?-th and 2|Xo|?+1-th iterations of this operator, from the zone T. We denote 
them by Mı and Mp respectively. By monotonicity, the inclusion M,(6) C Mo(6d) 
holds for every 6 > 0. In addition, both Mı and Mg are decreasing w.r.t. ô, 
thus one can identify the value 69 = inf{d > 0 | Mi(d) C Mo(d)}. Then, the 
greatest fixpoint is equal to Mı for 6 < 69, and to the emptyset for ô at least 
ðo. As a second step, one applies the operator CPre,, to the greatest fixpoint. 
We denote the result by M. To conclude, one can then compute and return the 
value sup{ô € [0,d59[ | 0 E€ M(6)} of maximal perturbation. 
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5 Synthesis of Robust Controllers 


We are now ready to solve the parametrised robust controller synthesis problem, 
that is to find, if it exists, a lasso pip2 and a perturbation 6 such that the 
controller wins the game Gs(A) when following the lasso p1p2 as a strategy. As 
for the symbolic checking of emptiness of a Biichi timed language [17], we will 
use a double forward analysis to exhaust all possible lassos, each being tested for 
robustness by the techniques studied in previous section: a first forward analysis 
will search for p1, a path from the initial location to an accepting location, and 
a second forward analysis from each accepting location £ to find the cycle p2 
around £. Forward analysis means that we compute the successor zone Post,(Z) 
when following path p from zone Z. 


Abstractions of Lassos. Before studying in more details the two independent 
forward analyses, we first study what information we must keep about pı and p2 
in order to still being able to test the robustness of the lasso p1p2. A classical 
problem for robustness is the firing of a punctual transition, i.e. a transition where 
controller has a single choice of time delay: clearly such a firing will be robust 
for no possible choice of parameter 6. Therefore, we must at least forbid such 
punctual transitions in our forward analyses. We thus introduce a non-punctual 
successor operator Postg”. It consists of the standard successor operator Post, 
in the timed automaton A”P obtained from A by making strict every constraint 
appearing in the guards (1 < a < 2 becomes 1 < x < 2). The crucial point is that 
if a positive delay d can be taken by the controller while satisfying a set of strict 
constraints, then other delays are also possible, close enough to d. By analogy, 
a region is said to be non-punctual if it contains two valuations separated by 
a positive time delay. In particular, if such a region satisfies a constraint in A 
it also satisfies the corresponding strict constraint in A". Therefore, controller 
wins Gs(A) for some 6 > 0 if and only if he wins Gs(A®?) for some 6 > 0. 
The link between non-punctuality and robustness is as follows: 


Theorem 2. Let pip2 be a lasso of the timed automaton. We have 


45 > 0 0€ CPre?, (vX CPref (X)) <=> Post??(0) N (Ussov X CPre?, (X)) #0 
Proof. The proof of this theorem relies on three main ingredients: 


1. the timed automaton A®? allows one to compute (s~o CPre®(Z’) by classical 
predecessor operator: Preg?” (Z') = Usso CPre?(Z’): 

2. for all edges e, and zones Z and Z’, ZMPret?(Z’) # 0 if and only if Post8P(Z)n 
Z' Æ Ú: this duality property on predecessor and successor relations always 
holds, in particular in A™?. These two ingredients already imply that the 
theorem holds for a path reduced to a single edge e; 

3. we then prove the theorem by induction on length of the path using that 
ere CPre? p, (Z) = Usso CPre?, (Ussi CPre®, (Z)), due to the monotonic- 
ity of the CPre?, operator. 
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Therefore, in order to test the robustness of the lasso p1p2, it is enough to 
only keep in memory the sets Post;?(0) and Us.) VX CPre®, (X). 


Non-punctual Forward Analysis. As a consequence of the previous theorem, 
we can use a Classical forward analysis of the timed automaton A”? to look for 
the prefix pı of the lasso pip2. A classical inclusion check on zones allows to stop 
the exploration, this criterion being complete thanks to Theorem 2. It is worth 
reminding that we consider only bounded clocks, hence the number of reachable 
zones is finite, ensuring termination. 


Robust Cycle Search. We now perform a second forward analysis, from each 
possible final location, to find a robust cycle around it. To this end, for each 
cycle p2, we must compute the zone U5.) VX CPre®,, (X). This computation is 
obtained by arguments developed in Sect. 4 (Proposition 4). To enumerate cycles 
p2, we can again use a classical forward exploration, starting from the universal 
zone T. Using zone inclusion to stop the exploration is not complete: considering 
a path ph reaching a zone Z included in the zone Z2 reachable using some p2, ph 
could be robustly iterable while p2 is not. In order to ensure termination of our 
analysis, we instead use reachability relations inclusion checks. These tests are 
performed using the technique developed in Sect. 3, based on constraint graphs 
(Theorem 1). The correction of this inclusion check is stated in the following 
lemma, where Reach)” denotes the reachability relation associated with p in the 
automaton A"?. This result is derived from the analysis based on regions in [25]. 
Indeed, we can prove that the non-punctual reachability relation we consider 
captures the existence of non-punctual aperiodic paths in the region automaton, 
as considered in [25]. 


Lemma 3. Let pı a path from lo to some target location ¢,. Let p2, ph be two 
paths from l, to some location £L, such that Reach} C Reach’? For all paths 
2 


p3 from £ to L, Posto? (0) N (UssouX CPre®,,.,(X)) # Ú implies Post? (0) N 
(Us>o VX CPre®, p (X)) #0. 


6 Case Study 


We implemented our algorithm in C++. To illustrate our approach, we present 
a case study on the regulation of train networks. Urban train networks in big 
cities are often particularly busy during rush hours: trains run in high frequency 
so even small delays due to incidents or passenger misbehavior can perturb the 
traffic and end up causing large delays. Train companies thus apply regulation 
techniques: they slow down or accelerate trains, and modify waiting times in 
order to make sure that the traffic is fluid along the network. Computing robust 
schedules with provable guarantees is a difficult problem (see e.g. [9]). 

We study here a simplified model of a train network and aim at automati- 
cally synthesizing a controller that regulates the network despite perturbations, 
in order to ensure performance measures on total travel time for each train. 
Consider a circular train network with m stations s9,...,5m—1 and n trains. We 
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require that all trains are at distinct stations at all times. There is an interval 
of delays [@;, u;] attached to each station which bounds the travel time from s; 
tO $;41 mod m- Here the lower bound comes from physical limits (maximal allowed 
speed, and travel distance) while the upper bound comes from operator speci- 
fication (e.g. it is not desirable for a train to remain at station for more than 
3min). The objective of each train i is to cycle on the network while completing 
each tour within a given time interval [ti , t$]. 

All timing requirements are naturally encoded with clocks. Given a model, we 
solve the robust controller synthesis problem in order to find a controller choosing 
travel times for all trains ensuring a Biichi condition (visiting sı infinitely often). 
Given the fact that trains cannot be at the same station at any given time, it 
suffices to state the Biichi condition only for one train, since its satisfaction of 
the condition necessarily implies that of all other trains. 

Let us present two representative 


- - T 
instances and then comment the per- Scenario| m in| #Clocks| robust? |time 

: A 612 4 yes 4s 
formance of the algorithm on a set of B g2 1 mo) oa 
instances. Consider a network with two C 613 5 no 1263s 
trains and m stations, with Wi, wal = D 63 4 yes | 128s 
(200, 400] for each station 7, and the objec- E ea 2 yes | 538 
tive of both trains is the interval [250 - F l64 2 yes |42ds 
m, 350 - m], that is, an average travel time G 6/4 8 TO 
between stations that lies in [250,350]. H Jel 8 TO 
The algorithm finds an accepting lasso: I 20/2 2 yes | 76s 
intuitively, by choosing 6 small enough so J 20/2 2 yes | 55s 
that mô < 50, perturbations do not accu- K [|302| 2 yes |579s 


mulate too much and the controller can 
always choose delays for both trains and Fig. 4. Summary of experiments with 
satisfy the constraints. This case corre- ‘different sizes. In each scenario, we 
sponds to scenario A in Fig. 4. Consider êssign a different objective to a subset 
now the same network but with two differ- of trains. The answer is yes ifa robust 
EaR controller was found, no if none exists. 

ent objectives: [0, 300-m] and [300-m, oo). TO stands for a time-out of 30 min 
Thus, one train needs to complete each : 
cycle in at most 300 - m time units, while the other one in at least 300-m time 
units. A classical Biichi emptiness check reveals the existence of an accepting 
lasso: it suffices to move each train in exactly 300 time units between each sta- 
tion. This controller can even recover from perturbations for a bounded number 
of cycles: for instance, if a train arrives late at a station, the next travel time can 
be chosen smaller than 300. However, such corrections will cause the distance 
between the two trains to decrease and if such perturbations happen regularly, 
the system will eventually enter a deadlock. Our algorithm detects that there is 
no robust controller for the Btichi objective. This corresponds to the scenario B 
in Fig. 4. 

Figure 4 summarizes the outcome of our prototype implementation on other 
scenarios. The tool was run on a 3.2 Ghz Intel i7 processor running Linux, with 
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a 30min time out and 2GB of memory. The performance is sensitive to the 
number of clocks: on scenarios with 8 clocks the algorithm ran out of time. 


7 Conclusion 


Our case study illustrates the application of robust controller synthesis in small 
or moderate size problems. Our prototype relies on the DBM libraries that we use 
with twice as many clocks to store the constraints of the normalised constraint 
graphs. In order to scale to larger models, we plan to study extrapolation oper- 
ators and their integration in the computation of reachability relations, which 
seems to be a challenging task. Different strategies can also be adopted for the 
double forward analysis, switching between the two modes using heuristics, a 
parallel implementation, etc. 
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Abstract. Successfully synthesizing controllers for complex dynamical 
systems and specifications often requires leveraging domain knowledge 
as well as making difficult computational or mathematical tradeoffs. 
This paper presents a flexible and extensible framework for construct- 
ing robust control synthesis algorithms and applies this to the tradi- 
tional abstraction-based control synthesis pipeline. It is grounded in the 
theory of relational interfaces and provides a principled methodology to 
seamlessly combine different techniques (such as dynamic precision grids, 
refining abstractions while synthesizing, or decomposed control prede- 
cessors) or create custom procedures to exploit an application’s intrinsic 
structural properties. A Dubins vehicle is used as a motivating example 
to showcase memory and runtime improvements. 


Keywords: Control synthesis - Finite abstraction - 
Relational interface 


1 Introduction 


A control synthesizer’s high level goal is to automatically construct control soft- 
ware that enables a closed loop system to satisfy a desired specification. A vast 
and rich literature contains results that mathematically characterize solutions 
to different classes of problems and specifications, such as the Hamilton-Jacobi- 
Isaacs PDE for differential games [3], Lyapunov theory for stabilization [8], and 
fixed-points for temporal logic specifications [11,17]. While many control synthe- 
sis problems have elegant mathematical solutions, there is often a gap between 
a solution’s theoretical characterization and the algorithms used to compute it. 
What data structures are used to represent the dynamics and constraints? What 
operations should those data structures support? How should the control synthe- 
sis algorithm be structured? Implementing solutions to the questions above can 
require substantial time. This problem is especially critical for computationally 
challenging problems, where it is often necessary to let the user rapidly identify 
and exploit structure through analysis or experimentation. 
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Control Synthesis 
Formalism (Section 2) 


Coarsening interfaces 
(Section 4) 


Relational Interfaces 


. Refining and Sampling 
o Framework (Section 3) Interfaces (Section 5) 
User Intuition 

+ 
System Structure 


Decomposed Control 
Predecessors (Section 6) 


Fig. 1. By expressing many different techniques within a common framework, users 
are able to rapidly develop methods to exploit system structure in controller synthesis. 


1.1 Bottlenecks in Abstraction-Based Control Synthesis 


This paper’s goal is to enable a framework to develop extensible tools for robust 
controller synthesis. It was inspired in part by computational bottlenecks encoun- 
tered in control synthesizers that construct finite abstractions of continuous sys- 
tems, which we use as a target use case. A traditional abstraction-based control 
synthesis pipeline consists of three distinct stages: 


1. Abstracting the continuous state system into a finite automaton whose under- 
lying transitions faithfully mimic the original dynamics [21,23]. 

2. Synthesizing a discrete controller by leveraging data structures and symbolic 
reasoning algorithms to mitigate combinatorial state explosion. 

3. Refining the discrete controller into a continuous one. Feasibility of this step 
is ensured through the abstraction step. 


This pipeline appears in tools PESSOA [12] and SCOTS [19], which can exhibit 
acute computational bottlenecks for high dimensional and nonlinear system 
dynamics. A common method to mitigate these bottlenecks is to exploit a spe- 
cific dynamical system’s topological and algebraic properties. In MASCOT [7] 
and CoSyMA [14], multi-scale grids and hierarchical models capture notions of 
state-space locality. One could incrementally construct an abstraction of the 
system dynamics while performing the control synthesis step [10,15] as imple- 
mented in tools ROCS [9] and ARCS [4]. The abstraction overhead can also 
be reduced by representing systems as a collection of components composed in 
parallel [6,13]. These have been developed in isolation and were not previously 
interoperable. 


1.2 Methodology 


Figure 1 depicts this paper’s methodology and organization. The existing control 
synthesis formalism does not readily lend itself to algorithmic modifications that 
reflect and exploit structural properties in the system and specification. We use 
the theory of relational interfaces [22] as a foundation and augment it to express 
control synthesis pipelines. Interfaces are used to represent both system models 
and constraints. A small collection of atomic operators manipulates interfaces 
and is powerful enough to reconstruct many existing control synthesis pipelines. 

One may also add new composite operators to encode desirable heuristics 
that exploit structural properties in the system and specifications. The last 
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three sections encode the techniques for abstraction-based control synthesis from 
Sect. 1.1 within the relational interfaces framework. By deliberately deconstruct- 
ing those techniques, then reconstructing them within a compositional frame- 
work it was possible to identify implicit or unnecessary assumptions then gener- 
alize or remove them. It also makes the aforementioned techniques interoperable 
amongst themselves as well as future techniques. 

Interfaces come equipped with a refinement partial order that formalizes 
when one interface abstracts another. This paper focuses on preserving the 
refinement relation and sufficient conditions to refine discrete controllers back to 
concrete ones. Additional guarantees regarding completeness, termination, pre- 
cision, or decomposability can be encoded, but impose additional requirements 
on the control synthesis algorithm and are beyond the scope of this paper. 


1.3 Contributions 


To our knowledge, the application of relational interfaces to robust abstraction- 
based control synthesis is new. The framework’s building blocks consist of a col- 
lection of small, well understood operators that are nonetheless powerful enough 
to express many prior techniques. Encoding these techniques as relational inter- 
face operations forced us to simplify, formalize, or remove implicit assumptions 
in existing tools. The framework also exhibits numerous desirable features. 


1. It enables compositional tools for control synthesis by leveraging a theoretical 
foundation with compositionality built into it. This paper showcases a prin- 
cipled methodology to seamlessly combine the methods in Sect. 1.1, as well 
as construct new techniques. 

2. It enables a declarative approach to control synthesis by enforcing a strict 
separation between the high level algorithm from its low level implementation. 
We rely on the availability of an underlying data structure to encode and 
manipulate predicates. Low level predicate operations, while powerful, make 
it easy to inadvertently violate the refinement property. Conforming to the 
relational interface operations minimizes this danger. 


This paper’s first half is domain agnostic and applicable to general robust control 
synthesis problems. The second half applies those insights to the finite abstrac- 
tion approach to control synthesis. A smaller Dubins vehicle example is used 
to showcase and evaluate different techniques and their computational gains, 
compared to the unoptimized problem. In an extended version of this paper 
available at [1], a 6D lunar lander example leverages all techniques in this paper 
and introduces a few new ones. 


1.4 Notation 


Let = be an assertion that two objects are mathematically equivalent; as a 
special case ‘=’ is used when those two objects are sets. In contrast, the operator 


6 > 


==’ checks whether two objects are equivalent, returning true if they are and 
false otherwise. A special instance of ‘==’ is logical equivalence ‘6. 
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Variables are denoted by lower case letters. Each variable v is associated with 
a domain of values D(v) that is analogous to the variable’s type. A composite 
variable is a set of variables and is analogous to a bundle of wrapped wires. From 


a collection of variables v1,...,vag a composite variable v can be constructed 
by taking the union v = vı U... U vm and the domain D(v) = Mec; D(v;). 
Note that the variables v,,...,v,¢ above may themselves be composite. As an 


example if v is associated with a M-dimensional Euclidean space R™, then it is a 
composite variable that can be broken apart into a collection of atomic variables 
U1,.-.,Um where D(v;) = R for all i € {1,..., M}. The technical results herein 
do not distinguish between composite and atomic variables. 

Predicates are functions that map variable assignments to a Boolean value. 
Predicates that stand in for expressions/formulas are denoted with capital let- 
ters. Predicates P and Q are logically equivalent (denoted by P = Q) if and 
only if P > Q and Q = P are true for all variable assignments. The universal 
and existential quantifiers V and J eliminate variables and yield new predicates. 
Predicates JwP and VwP do not depend on w. If w is a composite variable 
w=w,U...Uwy then JwP is simply a shorthand for Jw,...dwy P. 


2 Control Synthesis for a Motivating Example 


As a simple, instructive example consider a planar Dubins vehicle that is tasked 
with reaching a desired location. Let x = {pz,py,6} be the collection of state 
variables, u = {v,w} be a collection input variables to be controlled, at = 
{pz Pp, ,0*} represent state variables at a subsequent time step, and L = 1.4 be 
a constant representing the vehicle length. The constraints 


P} == Px + v cos(0) (Fo) 
Py == Py + vsin(0) (Fy) 
gt == 0+ F sin(w) (Fo) 


characterize the discrete time dynamics. The continuous state domain is D(x) = 
[—2, 2] x [—2,2] x [—7,7), where the last component is periodic so —7 and 7 
are identical values. The input domains are D(v) = {0.25,0.5} and D(w) = 
{—1.5,0, 1.5} 

Let predicate F = Fy A Fy A Fo represent the monolithic system dynam- 
ics. Predicate T depends only on x and represents the target set [—0.4,0.4] x 
[—0.4,0.4] x [—7,7), encoding that the vehicle’s position must reach a square 
with any orientation. Let Z be a predicate that depends on variable x* that 
encodes a collection of states at a future time step. Equation (1) characterizes 
the robust controlled predecessor, which takes Z and computes the set of states 
from which there exists a non-blocking assignment to u that guarantees zt will 
satisfy Z, despite any non-determinism contained in F. The term Jxz* F prevents 
state-control pairs from blocking, while Vzt(F = Z) encodes the state-control 
pairs that guarantee satisfaction of Z. 


cpre(F, Z) = Ju(saat F AVat(F => Z)). (1) 
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The controlled predecessor is used to solve safety and reach games. We can 
solve for a region for which the target T (respectively, safe set S) can be reached 
(made invariant) via an iteration of an appropriate reach (safe) operator. Both 
iterations are given by: 


Reach Iter: Zo= L Ziq, = reach(F, Z;,T) =cpre(F,Z;) VT. (2) 
Safety Iter: Zo= S Zi41 = safe(F, Zi, S) = cpre( F, Zi) A S. (3) 


The above iterations are not guaran- 
teed to reach a fixed point in a finite 
number of iterations, except under certain 
technical conditions [21]. Figure 2 depicts 
an approximate region where the con- 
troller can force the Dubins vehicle to 
enter T. We showcase different improve- 
ments relative to a base line script used to 
generate Fig. 2. A toolbox that adopts this 
paper’s framework is being actively devel- 
oped and is open sourced at [2]. It is writ- 
ten in python 3.6 and uses the dd pack- Fig. 2. Approximate solution to the 
age as an interface to CUDD [20], a library Dubins vehicle reach game visualized as 
in C/C++ for constructing and manipulat- ® subset of the-state Spare: 
ing binary decision diagrams (BDD). All experiments were run on a single core 
of a 2013 Macbook Pro with 2.4 GHz Intel Core i7 and 8 GB of RAM. 

The following section uses relational interfaces to represent the controlled 
predecessor cpre(-) and iterations (2) and (3) as a computational pipeline. Sub- 
sequent sections show how modifying this pipeline leads to favorable theoretical 
properties and computational gains. 


egay} 


3 Relational Interfaces 


Relational interfaces are predicates augmented with annotations about each vari- 
able’s role as an input or output!. They abstract away a component’s internal 
implementation and only encode an input-output relation. 


Definition 1 (Relational Interface [22]). An interface M(i,0) consists of a 
predicate M over a set of input variables i and output variables o. 


For an interface M (i, o), we call (i, o) its input-output signature. An interface is a 
sink if it contains no outputs and has signature like (i, g), and a source if it con- 
tains no inputs like (@,0). Sinks and source interfaces can be interpreted as sets 
whereas input-output interfaces are relations. Interfaces encode relations through 
their predicates and can capture features such as non-deterministic outputs or 


1 Relational interfaces closely resemble assume-guarantee contracts [16]; we opt to use 
relational interfaces because inputs and outputs play a more prominent role. 
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blocking (i.e., disallowed, error) inputs. A system blocks for an input assign- 
ment if there does not exist a corresponding output assignment that satisfies the 
interface relation. Blocking is a critical property used to declare requirements; 
sink interfaces impose constraints by modeling constrain violations as blocking 
inputs. Outputs on the other hand exhibit non-determinism, which is treated as 
an adversary. When one interface’s outputs are connected to another’s inputs, 
the outputs seek to cause blocking whenever possible. 


3.1 Atomic and Composite Operators 


Operators are used to manipulate interfaces by taking interfaces and variables 
as inputs and yielding another interface. We will show how the controlled pre- 
decessor cpre(-) in (1) can be constructed by composing operators appearing in 
[22] and one additional one. The first, output hiding, removes interface outputs. 


Definition 2 (Output Hiding [22]). Output hiding operator ohide(w, F) 
over interface F(i,o) and outputs w yields an interface with signature (i, o\ w). 


ohide(w, F) = IwF (4) 


Existentially quantifying out w ensures that the input-output behavior over the 
unhidden variables is still consistent with potential assignments to w. The oper- 
ator nb(-) is a special variant of ohide(-) that hides all outputs, yielding a sink 
encoding all non-blocking inputs to the original interface. 


Definition 3 (Nonblocking Inputs Sink). Given an interface F(i,0), the 
nonblocking operation nb(F) yields a sink interface with signature (i,@) and 
predicate nd(F) = JoF. If F(i,@) is a sink interface, then nb(F') = F yields 
itself. If F(@,o0) is a source interface, then nd(F) = L if and only if F = L; 
otherwise nb(F) = T. 


The interface composition operator takes multiple interfaces and “collapses” 
them into a single input-output interface. It can be viewed as a generalization 
of function composition in the special case where each interface encodes a total 
function (i.e., deterministic output and inputs never block). 


Definition 4 (Interface Composition [22]). Let Fı(i1,01) and F2(i2,02) be 
interfaces with disjoint output variables 0; N o2 = Ø and i, N 02 = Ø which 
signifies that F2’s outputs may not be fed back into Fı’s inputs. Define new 
composite variables 


1012 = O01 N 19 (5) 
i12 = (i4 U i2) \ tore (6) 
012 = 01 U 02. (7) 


Composition comp(F,, Fz) is an interface with signature (i12,012) and predicate 
Fı A Fy A Yoio( Fi = nb(F2)). (8) 


Interface subscripts may be swapped if instead F>’s outputs are fed into F). 
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Interfaces F} and F are composed in parallel if 102; = Ø holds in addition to 
i012 = Ø. Equation (8) under parallel composition reduces to F ^A Fy (Lemma 
6.4 in [22]) and comp(-) is commutative and associative. If ioj2 # Ø, then they 
are composed in series and the composition operator is only associative. Any 
acyclic interconnection can be composed into a single interface by systematically 
applying Definition 4’s binary composition operator. Non-deterministic outputs 
are interpreted to be adversarial. Series composition of interfaces has a built-in 
notion of robustness to account for F’s non-deterministic outputs and blocking 
inputs to F> over the shared variables ioj2. The term Voj2(F, = nb(F))) in 
Eq. (8) is a predicate over the composition’s input set i12. It ensures that if a 
potential output of Fı may cause F to block, then comp(F\, F2) must preemp- 
tively block. 

The final atomic operator is input hiding, which may only be applied to sinks. 
If the sink is viewed as a constraint, an input variable is “hidden” by an angelic 
environment that chooses an input assignment to satisfy the constraint. This 
operator is analogous to projecting a set into a lower dimensional space. 


Definition 5 (Hiding Sink Inputs). Input hiding operator thide(w, F) over 
sink interface F(i,@) and inputs w yields an interface with signature (i \ w, Ø). 


thide(w, F) = 3wF (9) 


Unlike the composition and output hiding operators, this operator is not included 
in the standard theory of relational interfaces [22] and was added to encode a 
controller predecessor introduced subsequently in Eq. (10). 


3.2 Constructing Control Synthesis Pipelines 


The robust controlled predecessor (1) can be expressed through operator com- 
position. 


Proposition 1. The controlled predecessor operator (10) yields a sink interface 
with signature (x, Ø) and predicate equivalent to the predicate in (1). 


cpre(F, Z) = thide(u, ohide(x*, comp(F, Z))). (10) 


The simple proof is provided in the extended version at [1]. Proposition 1 sig- 
nifies that controlled predecessors can be interpreted as an instance of robust 
composition of interfaces, followed by variable hiding. It can be shown that 
safe(F, Z, S) = comp(cpre(F, Z), S) because S(x, Ø) and cpre(F, Z) would be 
composed in parallel.” Figure. 3 shows a visualization of the safety game’s fixed 
point iteration from the point of view of relational interfaces. Starting from 
the right-most sink interface S (equivalent to Zo) the iteration (3) constructs a 
sequence of sink interfaces Z1, Z2, ... encoding relevant subsets of the state space. 
The numerous S(x, Ø) interfaces impose constraints and can be interpreted as 
monitors that raise errors if the safety constraint is violated. 


? Disjunctions over sinks are required to encode reach(-). This will be enabled by the 
shared refinement operator defined in Definition 10. 
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Fig. 3. Safety control synthesis iteration (3) depicted as a sequence of sink interfaces. 


3.3 Modifying the Control Synthesis Pipeline 


Equation (10)’s definition of cpre(-) is oblivious to the domains of variables 
x,u, and xt. This generality is useful for describing a problem and serving as a 
blank template. Whenever problem structure exists, pipeline modifications refine 
the general algorithm into a form that reflects the specific problem instance. 
They also allow a user to inject implicit preferences into a problem and reduce 
computational bottlenecks or to refine a solution. The subsequent sections apply 
this philosophy to the abstraction-based control techniques from Sect. 1.1: 


— Sect. 4: Coarsening interfaces reduces the computational complexity of a prob- 
lem by throwing away fine grain information. The synthesis result is conser- 
vative but the degree of conservatism can be modified. 

— Sect. 5: Refining interfaces decreases result conservatism. Refinement in com- 
bination with coarsening allows one to dynamically modulate the complexity 
of the problem as a function of multiple criteria such as the result granularity 
or minimizing computational resources. 

— Sect.6: If the dynamics or specifications are decomposable then the control 
predecessor operator can be broken apart to refect that decomposition. 


These sections do more than simply reconstruct existing techniques in the lan- 
guage of relational interfaces. They uncover some implicit assumptions in existing 
tools and either remove them or make them explicit. Minimizing the number of 
assumptions ensures applicability to a diverse collection of systems and specifi- 
cations and compatibility with future algorithmic modifications. 


4 Interface Abstraction via Quantization 


A key motivator behind abstraction-based control synthesis is that computing 
the game iterations from Eqs. (2) and (3) exactly is often intractable for high- 
dimensional nonlinear dynamics. Termination is also not guaranteed. Quantizing 
(or “abstracting”) continuous interfaces into a finite counterpart ensures that 
each predicate operation of the game terminates in finite time but at the cost of 
the solution’s precision. Finer quantization incurs a smaller loss of precision but 
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can cause the memory and computational requirements to store and manipulate 
the symbolic representation to exceed machine resources. 

This section first introduces the notion of interface abstraction as a refine- 
ment relation. We define the notion of a quantizer and show how it is a simple 
generalization of many existing quantizers in the abstraction-based control lit- 
erature. Finally, we show how one can inject these quantizers anywhere in the 
control synthesis pipeline to reduce computational bottlenecks. 


4.1 Theory of Abstract Interfaces 


While a controller synthesis algorithm can analyze a simpler model of the dynam- 
ics, the results have no meaning unless they can be extrapolated back to the orig- 
inal system dynamics. The following interface refinement condition formalizes a 
condition when this extrapolation can occur. 


Definition 6 (Interface Refinement [22]). Let F(i,o) and F°(2,6) be inter- 
faces. F is an abstraction of F if and only if i = i, o = ô, and 


nd( Ê) > nb(F) (11) 
(na(Ê) A F) > Ê (12) 


are valid formulas. This relationship is denoted by FF. 


Definition 6 imposes two main requirements between a concrete and abstract 
interface. Equation (11) encodes the condition where if Ê accepts an input, 
then F must also accept it; that is, the abstract component is more aggres- 
sive with rejecting invalid inputs. Second, if both systems accept the input 
then the abstract output set is a superset of the concrete function’s output set. 
The abstract interface is a conservative representation of the concrete interface 
because the abstraction accepts fewer inputs and exhibits more non-deterministic 
outputs. If both the interfaces are sink interfaces, then Ê < F reduces down to 
FC F when F i F are interpreted as sets. If both are source interfaces then the 
set containment direction is flipped and Ê < F reduces down to F C F. 

The refinement relation satisfies the required reflexivity, transitivity, and 
antisymmetry properties to be a partial order [22] and is depicted in Fig. 4. 
This order has a bottom element L which is a universal abstraction. Conve- 
niently, the bottom element | signifies both boolean false and the bottom of 
the partial order. This interface blocks for every potential input. In contrast, 
Boolean T plays no special role in the partial order. While T exhibits totally 
non-deterministic outputs, it also accepts inputs. A blocking input is considered 
“worse” than non-deterministic outputs in the refinement order. The refinement 
relation < encodes a direction of conservatism such that any reasoning done over 
the abstract models is sound and can be generalized to the concrete model. 


Theorem 1 (Informal Substitutability Result [22]). For any input that 
is allowed for the abstract model, the output behaviors exhibited by an abstract 
model contains the output behaviors exhibited by the concrete model. 
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Fig. 4. Example depiction of the refinement partial order. Each small plot on the 
depicts input-output pairs that satisfy an interface’s predicate. Inputs (outputs) vary 
along the horizontal (vertical) axis. Because B blocks on some inputs but A accepts all 
inputs B < A. Interface C exhibits more output non-determinism than A so © < A. 
Similarly D < B, D < C, T < C, etc. Note that B and C are incomparable because 
C exhibits more output non-determinism and B blocks for more inputs. The false 
interface L is a universal abstraction, while T is incomparable with B and D. 


If a property on outputs has been established for an abstract interface, then 
it still holds if the abstract interface is replaced with the concrete one. Infor- 
mally, the abstract interface is more conservative so if a property holds with the 
abstraction then it must also hold for the true system. All aforementioned inter- 
face operators preserve the properties of the refinement relation of Definition 6, 
in the sense that they are monotone with respect to the refinement partial order. 


Theorem 2 (Composition Preserves Refinement [22]). Let A < A and 
B < B. If the composition is well defined, then comp(A, B) < comp(A, B). 


Theorem 3 (Output Hiding Preserves Refinement [22]). If A < B, then 
ohide(w, A) < ohide(w, B) for any variable w. 


Theorem 4 (Input Hiding Preserves Refinement). If A,B are both sink 
interfaces and A < B, then thide(w, A) < thide(w, B) for any variable w. 


Proofs for Theorems 2 and 3 are provided in [22]. Theorem 4’s proof is simple 
and is omitted. One can think of using interface composition and variable hiding 
to horizontally (with respect to the refinement order) navigate the space of all 
interfaces. The synthesis pipeline encodes one navigated path and monotonic- 
ity of these operators yields guarantees about the path’s end point. Composite 
operators such as cpre(-) chain together multiple incremental steps. Furthermore 
since the composition of monotone operators is itself a monotone operator, any 
composite constructed from these parts is also monotone. In contrast, the coars- 
ening and refinement operators introduced later in Definitions 8 and 10 respec- 
tively are used to move vertically and construct abstractions. The “direction” 
of new composite operators can easily be established through simple reasoning 
about the cumulative directions of their constituent operators. 
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Fig. 5. Coarsening of the F, interface to 2°, 2* and 2° bins along each dimension for 
a fixed v assignment. Interfaces are coarsened within milliseconds for BDDs but the 
runtime depends on the finite abstraction’s data structure representation. 


4.2 Dynamically Coarsening Interfaces 


In practice, the sequence of interfaces Z; generated during synthesis grows in 
complexity. This occurs even if the dynamics F and the target/safe sets have 
compact representations (i.e., fewer nodes if using BDDs). Coarsening F and 
Zi combats this growth in complexity by effectively reducing the amount of 
information sent between iterations of the fixed point procedure. 

Spatial discretization or coarsening is achieved by use of a quantizer interface 
that implicitly aggregates points in a space into a partition or cover. 


Definition 7. A quantizer Q(i,o) is any interface that abstracts the identity 
interface (i == 0) associated with the signature (i, 0). 


Quantizers decrease the complexity of the system representation and make 
synthesis more computationally tractable. A coarsening operator abstracts an 
interface by connecting it in series with a quantizer. Coarsening reduces the 
number of non-blocking inputs and increases the output non-determinism. 


Definition 8 (Input/Output Coarsening). Given an interface F(i,o) and 
input quantizer Q(i,i), input coarsening yields an interface with signature (i, 0). 


icoarsen(F, Q(i,i)) = ohide(i, comp(Q(?, i), F)) (13) 


Similarly, given an output quantizer Q(o,6), output coarsening yields an inter- 
face with signature (i, ô). 


ocoarsen( F, Q(0,ô)) = ohide(o, comp( F, Q(o, 6))) (14) 


Figure 5 depicts how coarsening reduces the information required to encode a 
finite interface. It leverages a variable precision quantizer, whose implementation 
is described in the extended version at [1]. 

The corollary below shows that quantizers can be seamlessly integrated into 
the synthesis pipeline while preserving the refinement order. It readily follows 
from Theorems 2, 3, and the quantizer definition. 


Corollary 1. Input and output coarsening operations (13) and (14) are mono- 
tone operations with respect to the interface refinement order <. 
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Fig. 6. Number of BDD nodes (red) and number of states in reach basin (blue) with 
respect to the reach game iteration with a greedy quantization. The solid lines result 
from the unmodified game with no coarsening heuristic. The dashed lines result from 
greedy coarsening whenever the winning region exceeds 3000 BDD nodes. (Color figure 
online) 


It is difficult to know a priori where a specific problem instance lies along 
the spectrum between mathematical precision and computational efficiency. It is 
then desirable to coarsen dynamically in response to runtime conditions rather 
than statically beforehand. Coarsening heuristics for reach games include: 


— Downsampling with progress [7]: Initially use coarser system dynamics to 
rapidly identify a coarse reach basin. Finer dynamics are used to construct 
a more granular set whenever the coarse iteration “stalls”. In [7] only the Z; 
are coarsened during synthesis. We enable the dynamics F to be as well. 

— Greedy quantization: Selectively coarsening along certain dimensions by 
checking at runtime which dimension, when coarsened, would cause Z; to 
shrink the least. This reward function can be leveraged in practice because 
coarsening is computationally cheaper than composition. For BDDs, the win- 
ning region can be coarsened until the number of nodes reduces below a 
desired threshold. Figure 6 shows this heuristic being applied to reduce mem- 
ory usage at the expense of answer fidelity. A fixed point is not guaranteed 
as long as quantizers can be dynamically inserted into the synthesis pipeline, 
but is once quantizers are always inserted at a fixed precision. 


The most common quantizer in the literature never blocks and only increases 
non-determinism (such quantizers are called “strict” in [18,19]). If a quantizer is 
interpreted as a partition or cover, this requirement means that the union must 
be equal to an entire space. Definition 7 relaxes that requirement so the union 
can be a subset instead. It also hints at other variants such as interfaces that 
don’t increase output non-determinism but instead block for more inputs. 
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5 Refining System Dynamics 


Shared refinement [22] is an operation that takes two interfaces and merges them 
into a single interface. In contrast to coarsening, it makes interfaces more pre- 
cise. Many tools construct system abstractions by starting from the universal 
abstraction L, then iteratively refining it with a collection of smaller interfaces 
that represent input-output samples. This approach is especially useful if the 
canonical concrete system is a black box function, Simulink model, or source 
code file. These representations do not readily lend themselves to the predicate 
operations or be coarsened directly. We will describe later how other tools imple- 
ment a restrictive form of refinement that introduces unnecessary dependencies. 

Interfaces can be successfully merged whenever they do not contain contra- 
dictory information. The shared refinability condition below formalizes when 
such a contradiction does not exist. 


Definition 9 (Shared Refinability [22]). Interfaces F(t, 0) and F2(i, 0) with 
identical signatures are shared refinable if 


(nb( Fi) A nb( F2)) => Fo( Fy TAN Fə) (15) 


For any inputs that do not block for all interfaces, the corresponding output sets 
must have a non-empty intersection. If multiple shared refinable interfaces, then 
they can be combined into a single one that encapsulates all of their information. 


Definition 10 (Shared Refinement Operation [22]). The shared refine- 
ment operation combines two shared refinable interfaces F; and F>, yielding a 
new identical signature interface corresponding to the predicate 


The left term expands the set of accepted inputs. The right term signifies that 
if an input was accepted by multiple interfaces, the output must be consistent 
with each of them. The shared refinement operation reduces to disjunction for 
sink interfaces and to conjunction for source interfaces. 

Shared refinement’s effect is to move up the refinement order by combining 
interfaces. Given a collection of shared refinable interfaces, the shared refinement 
operation yields the least upper bound with respect to the refinement partial 
order in Definition 6. Violation of (15) can be detected if the interfaces fed into 
refine(-) are not abstractions of the resulting interface. 


5.1 Constructing Finite Interfaces Through Shared Refinement 


A common method to construct finite abstractions is through simulation and 
overapproximation of forward reachable sets. This technique appears in tools 
such as PESSOA [12], SCOTS [19], MASCOT [7], ROCS [9] and ARCS [4]. 
By covering a sufficiently large portion of the interface input space, one can 
construct larger composite interfaces from smaller ones via shared refinement. 
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e.e... »> 10 sample 
----> Coarsen 


Fig. 7. (Left) Result of sample and coarsen operations for control system interface 
F(xUu, x"). The I and Î interfaces encode the same predicate, but play different roles 
as sink and source. (Right) Visualization of finite abstraction as traversing the refine- 
ment partial order. Nodes represent interfaces and edges signify data dependencies for 
interface manipulation operators. Multiple refine edges point to a single node because 
refinement combines multiple interfaces. Input-output (IO) sample and coarsening are 
unary operations so the resulting nodes only have one incoming edge. The concrete 
interface F refines all others, and the final result is an abstraction Ê. 


Smaller interfaces are constructed by sampling regions of the input space and 
constructing an input-output pair. In Fig. 7’s left half, a sink interface I(xUu, æ) 
acts as a filter. The source interface Î(Ø, x U u) composed with F(x U u, x) 
prunes any information that is outside the relevant input region. The original 
interface refines any sampled interface. To make samples finite, interface inputs 
and outputs are coarsened. An individual sampled abstraction is not useful for 
synthesis because it is restricted to a local portion of the interface input space. 
After sampling many finite interfaces are merged through shared refinement. The 
assumption J; = nb(F) encodes that the dynamics won’t raise an error when 
simulated and is often made implicitly. Figure 7’s right half depicts the sample, 
coarsen, and refine operations as methods to vertically traverse the interface 
refinement order. 

Critically, refine(-) can be called within the synthesis pipeline and does not 
assume that the sampled interfaces are disjoint. Figure 8 shows the results from 
refining the dynamics with a collection of state-control hyper-rectangles that 
are randomly generated via uniformly sampling their widths and offsets along 
each dimension. These hyper-rectangles may overlap. If the same collection of 
hyper-rectangles were used in MASCOT, SCOTS, ARCS, or ROCS then this 
would yield a much more conservative abstraction of the dynamics because their 
implementations are not robust to overlapping or misaligned samples. PESSOA 
and SCOTS circumvent this issue altogether by enforcing disjointness with an 
exhaustive traversal of the state-control space, at the cost of unnecessarily cou- 
pling the refinement and sampling procedures. The lunar lander in the extended 
version |1] embraces overlapping and uses two mis-aligned grids to construct a 
grid partition with p elements with only p™ (5) ~! samples (where p is the 
number of bins along each dimension and N is the interface input dimension). 
This technique introduces a small degree of conservatism but its computational 
savings typically outweigh this cost. 
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Fig. 8. The number of states in the computed reach basin grows with the number of 
random samples. The vertical axis is lower bounded by the number of states in the 
target 131k and upper bounded by 631k, the number of states using an exhaustive 
traversal. Naive implementations of the exhaustive traversal would require 12 million 
samples. The right shows basins for 3000 (top) and 6000 samples (bottom). 


6 Decomposed Control Predecessor 


A decomposed control predecessor is available whenever the system state space 
consists of a Cartesian product and the dynamics are decomposed component- 
wise such as Fy, Fy, and Fg for the Dubins vehicle. This property is common for 
continuous control systems over Euclidean spaces. While one may construct F 
directly via the abstraction sampling approach, it is often intractable for larger 
dimensional systems. A more sophisticated approach abstracts the lower dimen- 
sional components Fy, F}, and Fẹ individually, computes F = comp( Fy, Fy, Fo), 
then feeds it to the monolithic cpre(-) from Proposition 1. This section’s app- 
roach is to avoid computing F at all and decompose the monolithic cpre(-). 
It operates by breaking apart the term ohide(xt,comp(F, Z)) in such a way 
that it respects the decomposition structure. For the Dubins vehicle example 
ohide(x*, comp(F, Z)) is replaced with 


ohide(pt, comp(F,,, ohide(p,, comp( Fy, ohide(O*, comp(Fy, Z)))))) 


yielding a sink interface with inputs pz, py, v,0, and w. This representation and 
the original ohide(x*, comp(F, Z)) are equivalent because comp(-) is associative 
and interfaces do not share outputs «t = {p{,p{,0*}. Figure9 shows multiple 
variants of cpre(-) and improved runtimes when one avoids preemptively con- 
structing the monolithic interface. The decomposed cpre(-) resembles techniques 
to exploit partitioned transition relations in symbolic model checking [5]. 

No tools from Sect.1.1 natively support decomposed control predecessors. 
We’ve shown a decomposed abstraction for components composed in parallel 
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Monolithic Partially Decomposed Fully Decomposed 
cpre(- ) cpre( - ) cpre(- ) 
Decomposition Parallel Compose|Reach Game 

Runtime (s) Runtime (s) 
F (Monolithic) 0.56 103.09 
1 F; (Partially Decomp.) 0.02 28.31 
Fz, Fy (Partially Decomp.) 0.01 28.71 
Fry, F; (Partially Decomp.) 0.06 10.61 
Fz, Fy, Fo (Fully Decomp.) n/a 4.42 


Fig. 9. A monolithic cpre(-) incurs unnecessary pre-processing and synthesis runtime 
costs for the Dubins vehicle reach game. Each variant of cpre(-) above composes 
the interfaces Fr, Fy and Fg in different permutations. For example, Fry represents 
comp(F,, Fy) and F represents comp(F x, Fy, Fo). 


but this can also be generalized to series composition to capture, for example, a 
system where multiple components have different temporal sampling periods. 


7 Conclusion 


Tackling difficult control synthesis problems will require exploiting all available 
structure in a system with tools that can flexibly adapt to an individual prob- 
lem’s idiosyncrasies. This paper lays a foundation for developing an extensible 
suite of interoperable techniques and demonstrates the potential computational 
gains in an application to controller synthesis with finite abstractions. Adhering 
to a simple yet powerful set of well-understood primitives also constitutes a dis- 
ciplined methodology for algorithm development, which is especially necessary 
if one wants to develop concurrent or distributed algorithms for synthesis. 
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Abstract. Reactive systems that operate in environments with complex 
data, such as mobile apps or embedded controllers with many sensors, 
are difficult to synthesize. Synthesis tools usually fail for such systems 
because the state space resulting from the discretization of the data is 
too large. We introduce TSL, a new temporal logic that separates con- 
trol and data. We provide a CEGAR-based synthesis approach for the 
construction of implementations that are guaranteed to satisfy a TSL 
specification for all possible instantiations of the data processing func- 
tions. TSL provides an attractive trade-off for synthesis. On the one 
hand, synthesis from TSL, unlike synthesis from standard temporal log- 
ics, is undecidable in general. On the other hand, however, synthesis 
from TSL is scalable, because it is independent of the complexity of the 
handled data. Among other benchmarks, we have successfully synthe- 
sized a music player Android app and a controller for an autonomous 
vehicle in the Open Race Car Simulator (TORCS). 


1 Introduction 


In reactive synthesis, we automatically translate a formal specification, typically 
given in a temporal logic, into a controller that is guaranteed to satisfy the 
specification. Over the past two decades there has been much progress on reac- 
tive synthesis, both in terms of algorithms, notably with techniques like GR(1)- 
synthesis [7] and bounded synthesis [20], and in terms of tools, as showcased, for 
example, in the annual SYNTCOMP competition [25]. 

In practice however, reactive synthesis has seen limited success. One of the 
largest published success stories [6] is the synthesis of the AMBA bus proto- 
col. To push synthesis even further, automatically synthesizing a controller for 
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an autonomous system has been recognized to be of critical importance [52]. 
Despite many years of experience with synthesis tools, our own attempts to syn- 
thesize such controllers with existing tools have been unsuccessful. The reason is 
that the tools are unable to handle the data complexity of the controllers. The 
controller only needs to switch between a small number of behaviors, like steer- 
ing during a bend, or shifting gears on high rpm. The number of control states 
in a typical controller (cf. [18]) is thus not much different from the arbiter in the 
AMBA case study. However, in order to correctly initiate transitions between 
control states, the driving controller must continuously process data from more 
than 20 sensors. 

If this data is included (even as a rough discretization) in the state space of 
the controller, then the synthesis problem is much too large to be handled by 
any available tools. It seems clear then, that a scalable synthesis approach must 
separate control and data. If we assume that the data processing is handled by 
some other approach (such as deductive synthesis [38] or manual programming), 
is it then possible to solve the remaining reactive synthesis problem? 

In this paper, we show scalable reactive synthesis is indeed possible. Sepa- 
rating data and control has allowed us to synthesize reactive systems, including 
an autonomous driving controller and a music player app, that had been impos- 
sible to synthesize with previously available tools. However, the separation of 
data and control implies some fundamental changes to reactive synthesis, which 
we describe in the rest of the paper. The changes also imply that the reactive 
synthesis problem is no longer, in general, decidable. We thus trade theoretical 
decidability for practical scalability, which is, at least with regard to the goal of 
synthesizing realistic systems, an attractive trade-off. 

We introduce Temporal Stream Logic (TSL), a new temporal logic that 
includes updates, such as |y — f x], and predicates over arbitrary function 
terms. The update |y < f x] indicates that the result of applying function f 
to variable x is assigned to y. The implementation of predicates and functions is 
not part of the synthesis problem. Instead, we look for a system that satisfies the 
TSL specification for all possible interpretations of the functions and predicates. 

This implicit quantification over all possible interpretations provides a useful 
abstraction: it allows us to independently implement the data processing part. 
On the other hand, this quantification is also the reason for the undecidability of 
the synthesis problem. If a predicate is applied to the same term twice, it must 
(independently of the interpretation) return the same truth value. The synthesis 
must then implicitly maintain a (potentially infinite) set of terms to which the 
predicate has previously been applied. As we show later, this set of terms can 
be used to encode PCP [45] for a proof of undecidability. 

We present a practical synthesis approach for TSL specifications, which is 
based on bounded synthesis [20] and counterexample-guided abstraction refine- 
ment (CEGAR) [9]. We use bounded synthesis to search for an implementation 
up to a (iteratively growing) bound on the number of states. This approach 
underapproximates the actual TSL synthesis problem by leaving the interpre- 
tation of the predicates to the environment. The underapproximation allows 
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Fig. 1. The TSL synthesis procedure uses a modular design. Each step takes input 
from the previous step as well as interchangeable modules (dashed boxes). 


for inconsistent behaviors: the environment might assign different truth values 
to the same predicate when evaluated at different points in time, even if the 
predicate is applied to the same term. However, if we find an implementation 
in this underapproximation, then the CEGAR loop terminates and we have a 
correct implementation for the original TSL specification. If we do not find an 
implementation in the underapproximation, we compute a counter strategy for 
the environment. Because bounded synthesis reduces the synthesis problem to 
a safety game, the counter strategy is a reachability strategy that can be rep- 
resented as a finite tree. We check whether the counter strategy is spurious by 
searching for a pair of positions in the strategy where some predicate results in 
different truth values when applied to the same term. If the counter strategy 
is not spurious, then no implementation exists for the considered bound, and 
we increase the bound. If the counter strategy is spurious, then we introduce a 
constraint into the specification that eliminates the incorrect interpretation of 
the predicate, and continue with the refined specification. 

A general overview of this procedure is shown in Fig. 1. The top half of the 
figure depicts the bounded search for an implementation that realizes a TSL 
specification using the CEGAR loop to refine the specification. If the specifica- 
tion is realizable, we proceed in the bottom half of the process, where a synthe- 
sized implementation is converted to a control flow model (CFM) determining 
the control of the system. We then specialize the CFM to Functional Reactive 
Programming (FRP), which is a popular and expressive programming paradigm 
for building reactive programs using functional programming languages [14]. 
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Sue. leaveApp ALWAYS (1eaveapp Sys ^ musicPlaying MP 


if (MP.musicPlaying ()) > [Ctrl = pause()]) 
Ctrl. pause () 
Sys.resumeApp() : ALWAYS (zesumeapp Sys 


pos = MP.trackPos () 
Ctrl.play(Tr,pos) 


— [Ctrl = play Tr (trackPos "P)]) 
Fig. 2. Sample code and specification for the music player app. 


Our framework supports any FRP library using the Arrow or Applicative design 
patterns, which covers most of the existing FRP libraries (e.g. [2,3,10,41]). 
Finally, the synthesized control flow is embedded into a project context, where 
it is equipped with function and predicate implementations and then compiled 
to an executable program. 

Our experience with synthesizing systems based on TSL specifications has 
been extremely positive. The synthesis works for a broad range of benchmarks, 
ranging from classic reactive synthesis problems (like escalator control), through 
programming exercises from functional reactive programming, to novel case stud- 
ies like our music player app and the autonomous driving controller for a vehicle 
in the Open Race Car Simulator (TORCS). 


2 Motivating Example 


To demonstrate the utility of our method, we synthesized a music player Android 
app! from a TSL specification. A major challenge in developing Android apps is 
the temporal behavior of an app through the Android lifecycle [46]. The Android 
lifecycle describes how an app should handle being paused, when moved to the 
background, coming back into focus, or being terminated. In particular, resume 
and restart errors are commonplace and difficult to detect and correct [46]. Our 
music player app demonstrates a situation in which a resume and restart error 
could be unwittingly introduced when programming by hand, but is avoided by 
providing a specification. We only highlight the key parts of the example here 
to give an intuition of TSL. The complete specification is presented in [19]. 
Our music player app utilizes the Android music player library (MP), as well 
as its control interface (Ctr1). It pauses any playing music when moved to the 
background (for instance if a call is received), and continues playing the currently 
selected track (Tr) at the last track position when the app is resumed. In the 
Android system (Sys), the leaveApp method is called whenever the app moves to 
the background, while the resumeApp method is called when the app is brought 
back to the foreground. To avoid confusion between pausing music and pausing 
the app, we use leaveApp and resumeApp in place of the Android methods 


1 https: //play.google.com/store/apps/details?id=com.mark.myapplication. 
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bool wasPlaying = false 


Sys.leaveApp () 


if (MP.musicPlaying()) 
ALWAYS ((1eaveapp Sys A musicPlaying MP 


wasPlaying = true 
Ctrl. pause () — [Ctrl = pause()]) 
else 
wasPlaying = false A ([Ctrl «= play Tr (trackPos MP)] 
Sys.resumeApp() : AS_SOON_AS resumeApp Sys) 


if (wasPlaying) 
pos = MP.trackPos() 
Ctrl.play(Tr,pos) 


Fig. 3. The effect of a minor change in functionality on code versus a specification. 


onPause and onResume. A programmer might manually write code for this as 
shown on the left in Fig. 2. 

The behavior of this can be directly described in TSL as shown on the right 
in Fig. 2. Even eliding a formal introduction of the notation for now, the specifi- 
cation closely matches the textual specification. First, when the user leaves the 
app and the music is playing, the music pauses. Likewise for the second part, 
when the user resumes the app, the music starts playing again. 

However, assume we want to change the behavior so that the music only 
plays on resume when the music had been playing before leaving the app 
in the first place. In the manually written program, this new functionality 
requires an additional variable wasPlaying to keep track of the music state. 
Managing the state requires multiple changes in the code as shown on the left 
in Fig. 3. The required code changes include: a conditional in the resumeApp 
method, setting wasPlaying appropriately in two places in leaveApp, and pro- 
viding an initial value. Although a small example, it demonstrates how a minor 
change in functionality may require wide-reaching code changes. In addition, 
this change introduces a globally scoped variable, which then might accidentally 
be set or read elsewhere. In contrast, it is a simple matter to change the TSL 
specification to reflect this new functionality. Here, we only update one part of 
the specification to say that if the user leaves the app and the music is playing, 
the music has to play again as soon as the app resumes. 

Synthesis allows us to specify a temporal behavior without worrying about 
the implementation details. In this example, writing the specification in TSL has 
eliminated the need of an additional state variable, similarly to a higher order 
map eliminating the need for an iteration variable. However, in more complex 
examples the benefits compound, as TSL provides a modular interface to spec- 
ify behaviors, offloading the management of multiple interconnected temporal 
behaviors from the user to the synthesis engine. 
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3 Preliminaries 


We assume time to be discrete and denote it by the set N of positive integers. 
A value is an arbitrary object of arbitrary type. V denotes the set of all values. 
The Boolean values are denoted by B C V. A stream s: N — V is a function 
fixing values at each point in time. An n-ary function f: V” — V determines 
new values from n given values, where the set of all functions (of arbitrary arity) 
is given by F. Constants are functions of arity 0. Every constant is a value, i.e., 
is an element of FA V. An n-ary predicate p: V” — B checks a property over n 
values. The set of all predicates (of arbitrary arity) is given by P, where P C F. 
We use B to denote the set of all total functions with domain A and image B. 

In the classical synthesis setting, inputs and outputs are vectors of Booleans, 
where the standard abstraction treats inputs and outputs as atomic propositions 
TUO, while their Boolean combinations form an alphabet X = 274°. Behavior 
then is described through infinite sequences a = a(O)a(1)a(2)... € X”. A 
specification describes a relation between input sequences a € (27)” and output 
sequences 3 € (2°)”. Usually, this relation is not given by explicit sequences, but 
by a fomula in a temporal logic. The most popular such logic is Linear Temporal 
Logic (LTL) [43], which uses Boolean connectives to specify behavior at specific 
points in time, and temporal connectives, to relate sub-specifications over time. 
The realizability and synthesis problems for LTL are 2EXPTIME-complete [44]. 

An implementation describes a realizing strategy, formalized via infinite trees. 
A @-labeled and Y-branching tree is a function o: Y* — $, where Y denotes the 
set of branching directions along a tree. Every node of the tree is given by a finite 
prefix v € Y*, which fixes the path to reach a node from the root. Every node is 
labeled by an element of ®. For infinite paths v € Y”, the branch ov denotes the 
sequence of labels that appear on v, i.e., Vt E€ N. (otv)(t) = o(v(0)...v(E—1)). 


4 'Temporal Stream Logic 


We present a new logic: Temporal Stream Logic (TSL), which is especially 
designed for synthesis and allows for the manipulation of infinite streams of 
arbitrary (even non-enumerative, or higher order) type. It provides a straight- 
forward notation to specify how outputs are computed from inputs, while using 
an intuitive interface to access time. The main focus of TSL is to describe tem- 
poral control flow, while abstracting away concrete implementation details. This 
not only keeps the logic intuitive and simple, but also allows a user to identify 
problems in the control flow even without a concrete implementation at hand. 
In this way, the use of TSL scales up to any required abstraction, such as API 
calls or complex algorithmic transformations. 


Architecture. A TSL formula yọ specifies a reactive system that in every time step 
processes a finite number of inputs I and produces a finite number of outputs O. 
Furthermore, it uses cells C to store a value computed at time t, which can then 
be reused in the next time step t+ 1. An overview of the architecture of such a 
system is given in Fig. 4a. In terms of behavior, the environment produces infinite 
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(a) Architecture (b) Term Definitions 


Fig. 4. General architecture of reactive systems that are specified in TSL on the left, 
and the structure of function, predicate and updates on the right. 


streams of input data, while the system uses pure (side-effect free) functions 
to transform the values of these input streams in every time step. After their 
transformation, the data values are either passed to an output stream or are 
passed to a cell, which pipes the output value from one time step back to the 
corresponding input value of the next. The behaviour of the system is captured 
by its infinite execution over time. 


Function Terms, Predicate Terms, and Updates. In TSL we differentiate 
between two elements: we use purely functional transformations, reflected by 
functions f € F and their compositions, and predicates p € P, used to control 
how data flows inside the system. To argue about both elements we use a term 
based notation, where we distinguish between function terms Tp and predicate 
terms Tp, respectively. Function terms are either constructed from inputs or cells 
(sı € IU C), or from functions, recursively applied to a set of function terms. 
Predicate terms are constructed similarly, by applying a predicate to a set of 
function terms. Finally, an update takes the result of a function computation 
and passes it either to an output or a cell (s E€ OUC). An overview of the syn- 
tax of the different term notations is given in Fig. 4b. Note that we use curried 
argument notation similar to functional programming languages. 

We denote sets of function and predicate terms, and updates by Jr, Jp and 
Tu, respectively, where Jp C Tr. We use F to denote the set of function literals 
and P C F to denote the set of predicate literals, where the literals si, so, f 
and p are symbolic representations of inputs and cells, outputs and cells, func- 
tions, and predicates, respectively. Literals are used to construct terms as shown 
in Fig. 4b. Since we use a symbolic representation, functions and predicates are 
not tied to a specific implementation. However, we still classify them according 
to their arity, i.e., the number of function terms they are applied to, as well as by 
their type: input, output, cell, function or predicate. Furthermore, terms can be 
compared syntactically using the equivalence relation =. To assign a semantic 
interpretation to functions, we use an assignment function (-): F > F. 
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Inputs, Outputs, and Computations. We consider momentary inputs i € Vl", 
which are assignments of inputs i € I to values v € V. For the sake of readability 
let Z = V, Input streams are infinite sequences 1 € Z“ consisting of infinitely 
many momentary inputs. 

Similarly, a momentary output o € Vl! is an assignment of outputs o € O 
to values v € V, where we also use O = VIOI, Output streams are infinite 
sequences ọ E€ O“. To capture the behavior of a cell, we introduce the notion 
of a computation ¢. A computation fixes the function terms that are used to 
compute outputs and cell updates, without fixing semantics of function literals. 
Intuitively, a computation only determines which function terms are used to 
compute an output, but abstracts from actually computing it. 

The basic element of a computation is a computation step c € Tad, which 
is an assignment of outputs and cells s E€ OUC to function terms Tp € Tp. For 
the sake of readability let C = ovS. A computation step fixes the control flow 
behaviour at a single point in time. A computation ¢ € C® is an infinite sequence 
of computation steps. 

As soon as input streams, and function and predicate implementations are 
known, computations can be turned into output streams. To this end, let 
(-): F — F be some function assignment. Furthermore, assume that there are 
predefined constants init, € FMV for every cell c € C, which provide an initial 
value for each stream at the initial point in time. To receive an output stream 
from a computation ç € C® under the input stream 1, we use an evaluation 
function 7,: C® x Z*° x N x Tp > V: 


u(t)(s:) ifs; €1 
NS tt, Si) = ¢ inits, ifs; eC At=0 
n(si4¢—1,s(¢—-1)(si)) ifsieC A t>0 

NAS; l, t, f To t: Tr) = (f) NAS: L, t, To) eS, NAS; l, t, Tm-—1) 
Then @),<,, € OY is defined via gy, (t)(0) = n{s, t t,0) for all t € N, o € O. 


Syntax. Every TSL formula ¢ is built according to the following grammar: 
p = TERUTy | œ | p^ | Ov | guy 


An atomic proposition 7 consists either of a predicate term, serving as a Boolean 
interface to the inputs, or of an update, enforcing a respective flow at the current 
point in time. Next, we have the Boolean operations via negation and conjunc- 
tion, that allow us to express arbitrary Boolean combinations of predicate evalu- 
ations and updates. Finally, we have the temporal operator next: Ow, to specify 
the behavior at the next point in time and the temporal operator until: VU Y, 
which enforces a property UV to hold until the property w holds, where w must 
hold at some point in the future eventually. 
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Semantics. Formally, this leads to the following semantics. Let (-): F > F, 
lı E€ T”, and ç E€ C“ be given, then the validity of a TSL formula y with respect 
to ç and « is defined inductively over t € N via: 


S, L, t Fo PT +++ Tm-1 3 MS, Lt, pP To t Tm—1) 

St Fy [s 7] <= ¢(t)(s) =T 

S, t,t Fy ap <= S, t Fy yp 

S, t,t Fy VAY > Sht Ey Ò A S, t Fy Y 

S, t Fy OW > S, 4,t i Fy y 

6,6 Fy VUY S Ww St Ves? <0". 9,00 Fy tA sat" Ey y 


Consider that the satisfaction of a predicate depends on the current computation 
step and the steps of the past, while for updates it only depends on the current 
computation step. Furthermore, updates are only checked syntactically, while 
the satisfaction of predicates depends on the given assignment (-) and the input 
stream lt. We say that ç and + satisfy p, denoted by ¢,z Fy 9, if¢,4,0 Fy y. 

Beside the basic operators, we have the standard derived Boolean opera- 
tors, as well as the derived temporal operators: release p Rw = 7((>W) U(A9)), 
finally Oy = trueU y, always Oy = falseRy, the weak version of until 
pWw = (pU y) V (Gy), and as soon as p AY = AWWA g). 


Realizability. We are interested in the oe realizability problem: given a 
TSL formula y, is there a strategy øo € C Z*] such that for every input  € ZT” 
and function implementation (-): F > F, the branch o2u satisfies y, i.e., 


Jo € CE. Wer’. VC): FoF. ote 


O P 


If such a strategy o exists, we say o realizes y. If we additionally ask for a 
concrete instantiation of o, we consider the synthesis problem of TSL. 


5 TSL Properties 


In order to synthesize programs from TSL specifications, we give an overview of 
the first part of our synthesis process, as shown in Fig. 1. First we show how to 
approximate the semantics of TSL through a reduction to LTL. However, due 
to the approximation, finding a realizable strategy immediately may fail. Our 
solution is a CEGAR loop that improves the approximation. This CEGAR loop 
is necessary, because the realizability problem of TSL is undecidable in general. 


Approximating TSL with LTL. We approximate TSL formulas with weaker LTL 
formulas. The approximation reinterprets the syntactic elements, Jp and Ty, as 
atomic propositions for LTL. This strips away the semantic meaning of the func- 
tion application and assignment in TSL, which we reconstruct by later adding 
assumptions lazily to the LTL formula. 

Formally, let Jp and Ty be the finite sets of predicate terms and updates, 
which appear in rsz, respectively. For every assigned signal, we partition Ty 
into W, couc Zy- For every c € C let Teja = Ty U {flce = cl}, for o € O let 
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a(y_to_y A x_to_y) 
A O(y_to_y V x_to_y) -Q> p_x A apy 
AO p-x > Op-y 


(a) TSL specification (b) initial approximation (c) spurious counter-strategy 


(ly = yl v ly = x]) 
AOpx> py 


Fig. 5. A TSL specification (a) with input x and cell y that is realizable. A winning 
strategy is to save x to y as soon as p(x) is satisfied. However, the initial approxima- 
tion (b), that is passed to an LTL synthesis solver, is unrealizable, as proven through 
the counter-strategy (c) returned by the LTL solver. 


Te jia 


over the input propositions 7p and output propositions Ty/ią as follows: 


= Tọ, and let Tuia = Us couce Toia We construct the LTL formula yr, 


PLTL = ( VAN V&A VAN =r) A SYNTACTICCONVERSION (4 rsL) 
80 €0UC TET a ETF ia T} 
Intuitively, the first part of the equation partially reconstructs the semantic 
meaning of updates by ensuring that a signal is not updated with multiple values 
at a time. The second part extracts the reactive constraints of the TSL formula 
without the semantic meaning of functions and updates. 


Theorem 1 ([19]). If yirz is realizable, then prs is realizable. 


Note that unrealizability of prz does not imply that yrgz is unrealizable. It 
may be that we have not added sufficiently many environment assumptions to 
the approximation in order for the system to produce a realizing strategy. 


Example. As an example, we present a simple TSL specification in Fig. 5a. The 
specification asserts that the environment provides an input x for which the 
predicate p x will be satisfied eventually. The system must guarantee that even- 
tually p y holds. According to the semantics of TSL the formula is realizable. 
The system can take the value of x when p x is true and save it to y, thus guar- 
anteeing that p y is satisfied eventually. This is in contrast to LTL, which has no 
semantics for pure functions - taking the evaluation of p y as an environmentally 
controlled value that does not need to obey the consistency of a pure function. 


Refining the LTL Approximation. It is possible that the LTL solver returns a 
counter-strategy for the environment although the original TSL specification is 
realizable. We call such a counter-strategy spurious as it exploits the additional 
freedom of LTL to violate the purity of predicates as made possible by the 
underapproximation. Formally, a counter-strategy is an infinite tree 7: C* — 27°, 
which provides predicate evaluations in response to possible update assignments 
of function terms Tr € Tp to outputs o € O. W.l.o.g. we can assume that O, Tp 
and Jp are finite, as they can always be restricted to the outputs and terms that 
appear in the formula. A counter-strategy is spurious, iff there is a branch m? ç 
for some computation ç € C”, for which the strategy chooses an inconsistent 
evaluation of two equal predicate terms at different points in time, i.e., 
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Algorithm 1. Check-Spuriousness 


Input: bound b, counter-strategy 7: C*— 27° (finitely represented using m states) 
1: for all v € C™?, tp € Tp, t,t’ € {0,1,...,m-b—1} do 
2: if n} (V, Liat, TP) = Ng {0 lia, t, TP) A 
Tp E W(vo...Ue-1) ATP Ẹ T(V... vy—1) then 
3: w +— reduce (v, Tp, t,t’) 
4: return (Na Owi A ira O'wi —> (OTP = O'rp)) 


5: return ‘‘non-spurious’’ 


ç E€ CY. Jt, t €N. dtp € Tp. 
Tp € Tn(s(O)s(1)...s(t— 1)) A Tp € n(s(0)s(1)... s(t — 1))^ 
VW): F > F. nfs, mS, t, Te) = n(s, ms, t', TP). 


W 


Note that a non-spurious strategy can be inconsistent along multiple branches. 
Due to the definition of realizability the environment can choose function and 
predicate assignments differently against every system strategy accordingly. 

By purity of predicates in TSL the environment is forced to always return 
the same value for predicate evaluations on equal values. However, this semantic 
property cannot be enforced implicitly in LTL. To resolve this issue we use the 
returned counter-strategy to identify spurious behavior in order to strengthen 
the LTL underapproximation with additional environment assumptions. After 
adding the derived assumptions, we re-execute the LTL synthesizer to check 
whether the added assumptions are sufficient in order to obtain a winning strat- 
egy for the system. If the solver still returns a spurious strategy, we continue 
the loop in a CEGAR fashion until the set of added assumptions is sufficiently 
complete. However, if a non-spurious strategy is returned, we have found a proof 
that the given TSL specification is indeed unrealizable and terminate. 

Algorithm 1 shows how a returned counter-strategy m is checked for being 
spurious. To this end, it is sufficient to check m against system strategies 
bounded by the given bound b, as we use bounded synthesis [20]. Further- 
more, we can assume w.l.o.g. that m is given by a finite state representation, 
which is always possible due to the finite model guarantees of LTL. Also note 
that a, as it is returned by the LTL synthesizer, responds to sequences of sets 
of updates (274/**)*. However, in our case (27//:)* is an alternative representa- 
tion of C*, due to the additional “single update” constraints added during the 
construction of grrr. 

The algorithm iterates over all possible responses v € C’”? of the system 
up to depth m - b. This is sufficient, since any deeper exploration would result 
in a state repetition of the cross-product of the finite state representation of 7 
and any system strategy bounded by b. Hence, the same behaviour could also 
be generated by a sequence smaller than m - b. At the same time, the algorithm 
iterates over predicates Tp € Jp appearing in yrgz and times t and t smaller 
than m-b. For each of these elements, spuriousness is checked by comparing the 
output of m for the evaluation of Tp at times t and t’, which should only differ 
if the inputs to the predicates are different as well. This can only happen, if the 
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passed input terms have been constructed differently over the past. We check 
it by using the evaluation function 7 equipped with the identity assignment 
(Jia: F > F, with (f)ia = f for all f € F, and the input sequence tia, with 
tia(t)(i) = (t,i) for all t € N and i € I, that always generates a fresh input. 
Syntactic inequality of n, (V, lia; t, TP) and 7, (V, lia, t’, Tp) then is a sufficient 
condition for the existence of an assignment (-): F — F, for which Tp evaluates 
differently at times t and t. 

If spurious behaviour of m could be found, then the revealing response v € C* 
is first simplified using reduce, which reduces v again to a sequence of sets 
of updates w € (274/2)* and removes updates that do not affect the behavior 
of Tp at the times t and t’ to accelerate the termination of the CEGAR loop. 
Afterwards, the sequence w is turned into a new assumption that prohibits the 
spurious behavior, generalized to prevent it even at arbitrary points in time. 

As an example of this process, reconsider the spurious counter-strategy of 
Fig. 5c. Already after the first system response |y = x], the environment pro- 
duces an inconsistency by evaluating p x and p y differently. This is inconsistent, 
as the cell y holds the same value at time t = 1 as the input x at time t = 0. Using 
Algorithm 1 we generate the new assumption O([y x] (px e Op y)). 
After adding this strengthening the LTL synthesizer returns a realizability result. 


Undecidability. Although we can approximate the semantics of TSL with LTL, 
there are TSL formulas that cannot be expressed as LTL formulas of finite size. 


Theorem 2 ([19]). The realizability problem of TSL is undecidable. 


6 TSL Synthesis 


Our synthesis framework provides a modular refinement process to synthesize 
executables from TSL specifications, as depicted in Fig. 1. The user initially 
provides a TSL specification over predicate and function terms. At the end of 
the procedure, the user receives an executable to control a reactive system. 

The first step of our method answers the synthesis question of TSL: if the 
specification is realizable, then a control flow model is returned. To this end, an 
intermediate translation to LTL is used, utilizing an LTL synthesis solver that 
produces circuits in the AIGER format. If the specification is realizable, the 
resulting control flow model is turned into Haskell code, which is implemented 
as an independent Haskell module. The user has the choice between two differ- 
ent targets: a module built on Arrows, which is compatible with any Arrowized 
FRP library, or a module built on Applicative, which supports Applicative FRP 
libraries. Our procedure generates a single Haskell module per TSL specification. 
This makes naturally decomposing a project according to individual tasks possi- 
ble. Each module provides a single component, which is parameterized by their 
initial state and the pure function and predicate transformations. As soon as 
these are provided as part of the surrounding project context, a final executable 
can be generated by compiling the Haskell code. 

An important feature of our synthesis approach is that implementations for 
the terms used in the specification are only required after synthesis. This allows 
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Fig. 6. Example CFM of the music player generated from a TSL specification. 


the user to explore several possible specifications before deciding on any term 
implementations. 


Control Flow Model. The first step of our approach is the synthesis of a Control 
Flow Model M (CFM) from the given TSL specification y, which provides us 
with a uniform representation of the control flow structure of our final program. 
Formally, a CFM M is a tuple M = (1, O,C, V, 2,6), where I is a finite set 
of inputs, O is a finite set of outputs, C is a finite set of cells, V is a finite set of 
vertices, l: V — F assigns a vertex a function f € F or a predicate p € P, and 


ô: (OUCUV)xN>(IUCUV U{1L}) 


is a dependency relation that relates every output, cell, and vertex of the CFM 
with n € N arguments, which are either inputs, cells, or vertices. Outputs and 
cells s € OUC always have only a single argument, i.e., 6(s,0) Æ L and Ym > 0. 
(s,m) = L, while for vertices x € V the number of arguments n € N align with 
the arity of the assigned function or predicate L(x), i.e., Ym € N. d(a,m) = L 
«= m >n. A CFM is valid if it does not contain circular dependencies, i.e., on 
every cycle induced by 6 there must lie at least a single cell. We only consider 
valid CFMs. 

An example CFM for our music player of Sect. 2 is depicted in Fig. 6. Inputs I 
come from the left and outputs O leave on the right. The example contains a 
single cell c € C, which holds the stateful memory Cel11, introduced during syn- 
thesis for the module. The green, arrow shaped boxes depict vertices V, which 
are labeled with functions and predicates names, according to £. For the Boolean 
decisions that define 6, we use circuit symbols for conjunction, disjunction, and 
negation. Boolean decisions are piped to a multiplexer gate that selects the 
respective update streams. This allows each update stream to be passed to an 


622 B. Finkbeiner et al. 


output stream if and only if the respective Boolean trigger evaluates positively, 
while our construction ensures mutual exclusion on the Boolean triggers. For 
code generation, the logic gates are implemented using the corresponding dedi- 
cated Boolean functions. After building a control structure, we assign semantics 
to functions and predicates by providing implementations. To this end, we use 
Functional Reactive Programming (FRP). Prior work has established Causal 
Commutative Arrows (CCA) as an FRP language pattern equivalent to a CFM 
[33,34,53]. CCAs are an abstraction subsumed by other functional reactive pro- 
gramming abstractions, such as Monads, Applicative and Arrows [32,33]. There 
are many FRP libraries using Monads [11,14,42], Applicative [2,3,23,48], or 
Arrows [10,39,41,51], and since every Monad is also an Applicative and Applica- 
tive/ Arrows both are universal design patterns, we can give uniform translations 
to all of these libraries using translations to just Applicative and Arrows. Both 
translations are possible due to the flexible notion of a CFM. 

In the last step, the synthesized FRP program is compiled into an executable, 
using the provided function and predicate implementations. This step is not fixed 
to a single compiler implementation, but in fact can use any FRP compiler (or 
library) that supports a language abstraction at least as expressive as CCA. For 
example, instead of creating an Android music player app, we could target an 
FRP web interface [48] to create an online music player, or an embedded FRP 
library [23] to instantiate the player on a computationally more restricted device. 
By using the strong core of CCA, we even can directly implement the player in 
hardware, which is for example possible with the CAaSH compiler [3]. Note that 
we still need separate implementations for functions and predicates for each 
target. However, the specification and synthesized CFM always stay the same. 


7 Experimental Results 


To evaluate our synthesis procedure we implemented a tool that follows the 
structure of Fig. 1. It first encodes the given TSL specification in LTL and then 
refines it until an LTL solver either produces a realizability result or returns a 
non-spurious counter-strategy. For LTL synthesis we use the bounded synthesis 
tool BoSy [15]. As soon as we get a realizing strategy it is translated to a cor- 
responding CFM. Then, we generate the FRP program structure. Finally, after 
providing function implementations the result is compiled into an executable. 

To demonstrate the effectiveness of synthesizing TSL, we applied our tool to 
a collection of benchmarks from different application domains, listed in Table 1. 
Every benchmark class consists of multiple specifications, addressing different 
features of TSL. We created all specifications from scratch, where we took care 
that they either relate to existing textual specifications, or real world scenarios. 
A short description of each benchmark class is given in [19]. 

For every benchmark, we report the synthesis time and the size of the syn- 
thesized CFM, split into the number of cells (|Cy,|) and vertices (|Va4|) used. 
The synthesized CFM may use more cells than the original TSL specification 
if synthesis requires more memory in order to realize a correct control flow. 
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Table 1. Number of cells |Cm]| and vertices |Vm]| of the resulting CFM M and syn- 


thesis times for a collection of TSL specifications y. A * indicates that the benchmark 
additionally has an initial condition as part of the specification. 
Bencumark (e) || lol | m | 101 | IPL] 1E1 || Cal | [Maal | Spesis 
Button 
default T 1 2 1 3 3 8 0.364 
Music App 
simple 91 3 1 4 7 2 25 0.77 
system feedback 103 | 3 1 5 8 2 31 0.572 
motivating example 87 | 3 1 5 8 2 70 1.783 
FRPZoo 
scenariog 54 1 3 2 8 4 36 1.876 
scenarios 50 1 3 2 7 4 32 1.196 
scenario10 48 1 3 2 7 4 32 1.161 
Escalator 
non-reactive 8 0 1 0 1 2 4 0.370 
non-counting 15 | 2 1 2 4 2 19 0.304 
counting 34 | 2 2 3 7 3 23 0.527 
counting* 43 2 2 3 8 4 43 0.621 
bidirectional 111 | 2 2 5 10 3 214 4.555 
bidirectional* 124 | 2 2 5 11 4 287 16.213 
smart 45 2 1 2 4 4 159 24.016 
Slider 
default 50 1 1 2 4 2 15 0.664 
scored 67 1 3 4 8 4 62 3.965 
delayed 71 1 3 4 8 5 159 7.194 
Haskell-TORCS 
simple 40 5 3 2 16 4 37 0.680 
advanced 
gearing 23 4 1 1 3 2 7 0.403 
accelerating 15 2 2 2 6 3 11 0.391 
steering 
simple 45 2 1 4 6 2 31 0.459 
improved 100 | 2 2 4 10 3 26 1.347 
smart 76 3 2 4 8 5 227 3.375 


Table 2. Set of programs that use purity to keep one or two counters in range. Synthesis 
needs multiple refinements of the specification to proof realizability. 


,| SYNTHESIS 
BENCHMARK (vy) || Jel | 1O] | [P| | [E] || [Cm] | |Vm] [REFINEMENTS TIME (8) 
inrange-single 23)2] 1/2) 4 2 21 3 0.690 
inrange-two 5113/3] 47 4 440 6 173.132 
graphical-single 55] 2] 3 | 2 | 6 4 343 9 1767.948 
graphical-two 113| 3) 5 | 4 | 9 - - - į 10000 
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The synthesis was executed on a quad-core Intel Xeon processor (E3-1271 v3, 
3.6GHz, 32 GB RAM, PC1600, ECC), running Ubuntu 64bit LTS 16.04. 

The experiments of Table 1 show that TSL successfully lifts the applicability 
of synthesis from the Boolean domain to arbitrary data domains, allowing for new 
applications that utilize every level of required abstraction. For all benchmarks 
we always found a realizable system within a reasonable amount of time, where 
the results often required synthesized cells to realize the control flow behavior. 

We also considered a preliminary set of benchmarks that require multiple 
refinement steps to be synthesizable. An overview of the results is given in 
Table 2. The benchmarks are inspired by examples of the Reactive Banana FRP 
library [2]. Here, purity of function and predicate applications must be utilized 
by the system to ensure that the value of one or two counters never goes out of 
range. Thereby, the system not only needs purity to verify this condition, but also 
to take the correct decisions in the resulting implementation to be synthesized. 


8 Related Work 


Our approach builds on the rich body of work on reactive synthesis, see [17] for a 
survey. The classic reactive synthesis problem is the construction of a finite-state 
machine that satisfies a specification in a temporal logic like LTL. Our approach 
differs from the classic problem in its connection to an actual programming 
paradigm, namely FRP, and its separation of control and data. 

The synthesis of reactive programs, rather than finite-state machines, has 
previously been studied for standard temporal logic [21,35]. Because there is no 
separation of control and data, these approaches do not directly scale to realistic 
applications. With regard to FRP, a Curry-Howard correspondence between LTL 
and FRP in a dependently typed language was discovered [28,29] and used to 
prove properties of FRP programs [8,30]. However, our paper is the first, to the 
best of our knowledge, to study the synthesis of FRP programs from temporal 
specifications. 

The idea to separate control and data has appeared, on a smaller scale, in the 
synthesis with identifiers, where identifiers, such as the number of a client in a 
mutual exclusion protocol, are treated symbolically [13]. Uninterpreted functions 
have been used to abstract data-related computational details in the synthesis 
of synchronization primitives for complex programs [5]. Another connection to 
other synthesis approaches is our CEGAR loop. Similar refinement loops also 
appear in other synthesis appraches, however with a different purpose, such as 
the refinement of environment assumptions [1]. 

So far, there is no immediate connection between our approach and the sub- 
stantial work on deductive and inductive synthesis, which is specifically con- 
cerned with the data-transformation aspects of programs [16,31,40,47,49, 50]. 
Typically, these approaches are focussed on non-reactive sequential programs. 
An integration of deductive and inductive techniques into our approach for reac- 
tive systems is a very promising direction for future work. Abstraction-based 
synthesis [4, 12,24,37] may potentially provide a link between the approaches. 
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9 Conclusions 


We have introduced Temporal Stream Logic, which allows the user to specify 
the control flow of a reactive program. The logic cleanly separates control from 
complex data, forming the foundation for our procedure to synthesize FRP pro- 
grams. By utilizing the purity of function transformations our logic scales inde- 
pendently of the complexity of the data to be handled. While we have shown 
that scalability comes at the cost of undecidability, we addressed this issue by 
using a CEGAR loop, which lazily refines the underapproximation until either 
a realizing system implementation or an unrealizability proof is found. 

Our experiments indicate that TSL synthesis works well in practice and on 
a wide range of programming applications. TSL also provides the foundations 
for further extensions. For example, a user may want to fix the semantics for a 
subset of the functions and predicates. Such refinements can be implemented as 
part of a much richer TSL Modulo Theory framework. 
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Abstract. A controller is a device that interacts with a plant. At each time point, 
it reads the plant’s state and issues commands with the goal that the plant oper- 
ates optimally. Constructing optimal controllers is a fundamental and challenging 
problem. Machine learning techniques have recently been successfully applied to 
train controllers, yet they have limitations. Learned controllers are monolithic and 
hard to reason about. In particular, it is difficult to add features without retraining, 
to guarantee any level of performance, and to achieve acceptable performance 
when encountering untrained scenarios. These limitations can be addressed by 
deploying quantitative run-time shields that serve as a proxy for the controller. 
At each time point, the shield reads the command issued by the controller and 
may choose to alter it before passing it on to the plant. We show how optimal 
shields that interfere as little as possible while guaranteeing a desired level of 
controller performance, can be generated systematically and automatically using 
reactive synthesis. First, we abstract the plant by building a stochastic model. 
Second, we consider the learned controller to be a black box. Third, we mea- 
sure controller performance and shield interference by two quantitative run-time 
measures that are formally defined using weighted automata. Then, the problem 
of constructing a shield that guarantees maximal performance with minimal inter- 
ference is the problem of finding an optimal strategy in a stochastic 2-player game 
“controller versus shield” played on the abstract state space of the plant with a 
quantitative objective obtained from combining the performance and interference 
measures. We illustrate the effectiveness of our approach by automatically con- 
structing lightweight shields for learned traffic-light controllers in various road 
networks. The shields we generate avoid liveness bugs, improve controller per- 
formance in untrained and changing traffic situations, and add features to learned 
controllers, such as giving priority to emergency vehicles. 


1 Introduction 


The controller synthesis problem is a fundamental problem that is widely studied by 
different communities [42,44]. A controller is a device that interacts with a plant. In 
each point in time it reads the plant’s state, e.g., given by sensor reading, and issues 
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a command based on the state. The controller should guarantee that the plant operates 
correctly or optimally with respect to some given specification. As a running example, 
we consider a traffic light controller for a road intersection (see Fig. 1). The state of the 
plant refers to the state of the roads leading to the junction; namely, the positions of the 
cars, their speeds, their sizes, etc. A controller command consists of a light configuration 
for the junction in the next time frame. Specifications can either be qualitative, e.g., 
“it should never be the case that a road with an empty queue gets a green light’, or 
quantitative, e.g., “the cost of a controller is the average waiting times of the cars in the 
junction”. 


Fig. 1. On the left, a concrete state depicted in the traffic simulator SUMO. On the right, we depict 
the corresponding abstract state with queues cut off at k = 5, and some outgoing transitions. 
Upon issuing action North-South, a car is evicted from each of the North-South queues. Then, 
we choose uniformly at random, out of the 16 possible options, the incoming cars to the queues, 
update the state, and cutoff the queues at k (e.g., when a car enters from East, the queue stays 5). 


A challenge in controller synthesis is that, since the number of possible plant read- 
ings is huge, it is computationally demanding to find an optimal command, given a 
plant state. Machine learning is a prominent approach to make decisions based on large 
amounts of collected data [28,37]. It is widely successful in practice and takes an inte- 
gral part in the design process of various systems. Machine learning has been suc- 
cessfully applied to train controllers [15,33,34] and specifically controllers for traffic 
control [20,35,39]. 

A shortcoming of machine-learning techniques is that the controllers that are pro- 
duced are black-box devices that are hard to reason about and modify without a com- 
plete re-training. It is thus challenging, for example, to obtain worst-case guarantees 
about the controller, which is particularly important in safety-critical settings. Attempts 
to address this problem come from both the formal methods community [46], where 
verification of learned systems is extensively studied [24,29], and the machine-learning 
community, where guarantees are added during the training process using reward engi- 
neering [13,18] or by modifying the exploration process [11, 19,38]. Both approaches 
require expertise in the respective field and suffer from limitations such as scalability for 
the first, and intricacy and robustness issues, for the second. Moreover, both techniques 
were mostly studied for safety properties. 

Another shortcoming of machine-learning techniques is that they require expertise 
and a fine-tuning of parameters. It is difficult, for example, to train controllers that are 
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robust to plant behaviors, e.g., a controller that has been trained on uniform traffic con- 
gestion meeting rush-hour traffic, which can be significantly different and can cause 
poor performance. Also, it is challenging to add features to a controller without retrain- 
ing, which is both costly and time consuming. These can include permanent features, 
e.g., priority to public transport, or temporary changes, e.g., changes due to an accident 
or construction. Again, since the training process is intricate, adding features during 
training can have unexpected effects. 

In this work, we use quantitative shields to deal with the limitations of learned or 
any other black-box controllers. A shield serves as a proxy between the controller and 
the plant. In each point in time, as before, the controller reads the state of the plant 
and issues a command. Rather than directly feeding the command to the plant, the 
shield first reads it along with an abstract plant state. The shield can then choose to 
keep the controller’s command or alter it, before issuing the command to the plant. The 
concept of shields was first introduced in [30], where shields for safety properties were 
considered and with a qualitative notion of interference: a shield is only allowed to 
interfere when a controller error occurs, which is only well-defined when considering 
safety properties. We elaborate on other shield-like approaches in the Sect. 1.1. 

Our goal is to automatically synthesize shields that optimize quantitative measures 
for black-box controllers. We are interested in synthesizing lightweight shields. We 
assume that the controller performs well on average, but has no worst-case guarantees. 
When combining the shield and the controller, intuitively, the controller should be active 
for the majority of the time and the shield intervenes only when it is required. We 
formalize the plant behavior as well as the interference cost using quantitative measures. 
Unlike safety objectives, where it is clear when a shield must interfere, with quantitative 
objectives, a non-interference typically does not have a devastating effect. It is thus 
challenging to decide, at each time point, whether the shield should interfere or not; the 
shield needs to balance the cost of interfering with the decrease in performance of not 
interfering. Automatic synthesis of shields is thus natural in this setting. 

We elaborate on the two quantitative measures we define. The interaction between 
the plant, controller, and shield gives rise to an infinite sequence over C x I’ x I”, where 
C is a set of plant states and T is a set of allowed actions. A triple (c, 71,72) means 
that the plant is in state c, the controller issues command 71, and the shield (possibly) 
alters it to y2. We use weighted automata to assign costs to infinite traces, which have 
proven to be a convenient, flexible, and robust quantitative specification language [14]. 
Our behavioral score measures the performance of the plant and it is formally given by 
a weighted automaton that assigns scores to traces over C x I”. Boolean properties are a 
special case, which include safety properties, e.g., “an emergency vehicle should always 
get a green light”, and liveness, e.g., “a car waiting in a queue eventually gets the green 
light”. An example of a quantitative score is the long-run average of the waiting times 
of the vehicles in the city. A second score measures the interference of a shield with 
a controller. It is given by a weighted automaton over the alphabet J’ x I’. A simple 
example of an interference score charges the shield 1 for every change of action and 
charges 0 when no change is made. Then, the score of an infinite trace can be phrased as 
the ratio of the time that the shield interferes. Using weighted automata we can specify 
more involved scores such as different charges for different types of alterations or even 
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charges that depend on the past, e.g., altering the controller’s command twice in a row 
is not allowed. 

Given a probabilistic plant model and a formal specification of behavioral and inter- 
ference scores, the problem of synthesizing an optimal shield is well-defined and can be 
solved by game theory. While the game-based techniques we use are those of discrete- 
event controller synthesis [3] in a stochastic setting with quantitative objectives, our 
set-up is quite different. In traditional controller synthesis, there are two entities; the 
controller and the adversarial plant. The goal is to synthesize a controller offline. In 
our setting, there are three entities: the plant, whose behavior we model probabilisti- 
cally, the controller, which we treat as a black-box and model as an adversary, and the 
shield, which we synthesize. Note that the shield’s synthesis procedure is done offline 
but it makes online decisions when it operates together with the controller and plant. 
Our plant model is formally given by a Markov decision process which is a standard 
model with which one models lack of knowledge about the plant using probability (see 
Fig. | and details in Example 1). The game is played on the MDP by two players; a 
shield and a controller, where the quantitative objective is given by the two scores. An 
optimal shield is then extracted from an optimal strategy for the shield player. The game 
we construct admits memoryless optimal strategies, thus the size of the shield is pro- 
portional to the size of the abstraction of the plant. In addition, it is implemented as a 
look-up table for actions in every state. Thus, the runtime overhead is a table look-up 
and hence negligible. 

We experiment with our framework by constructing shields for traffic lights in a 
network of roads. Our experimental results illustrate the usefulness of the framework. 
We construct shields that consistently improve the performance of controllers, espe- 
cially when exhibiting behavior that they are not trained on, but, more surprising, also 
while exhibiting trained behavior. We show that the use of a shield reduces variability 
in performance among various controllers, thus when using a shield, the choice of the 
parameters used in the training phase becomes less acute. We show how a shield can be 
used to add the functionality of prioritizing public transport as well as local fairness to a 
controller, both without re-training the controller. In addition, we illustrate how shields 
can add worst-case guarantees on liveness without a costly verification of the controller. 


1.1 Related Work 


A shield-like approach to adding safety to systems is called runtime assurance [47], and 
has applications, for example, in control of robotics [41] and drones [12]. In this frame- 
work, a switching mechanism alternates between running a high-performance system 
and a provably safe one. These works differ from ours since they consider safety specifi- 
cations. As mentioned earlier, a challenge with quantitative specifications is that, unlike 
safety specifications, a non-interference typically does not have a devastating effect, 
thus it is not trivial to decide when and to what extent to interfere. 

Another line of work is runtime enforcement, where an enforcer monitors a program 
that outputs events and can either terminate the program once it detects an error [45], or 
alter the event in order to guarantee, for example, safety [21], richer qualitative objec- 
tives [16], or privacy [26,49]. The similarities between an enforcer and a shield is in 
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their ability to alter events. The settings are quite different, however, since the enforced 
program is not reactive whereas we consider a plant that receives commands. 

Recently, formal approaches were proposed in order to restrict the exploration of the 
learning agent such that a set of logically constraints are always satisfied. This method 
can support other properties beyond safety, e.g., probabilistic computation tree logic 
(PCTL) [25,36], linear temporal logic (LTL) [1], or differential dynamic logic [17]. 
To the best of our knowledge, quantitative specifications have not yet been considered. 
Unlike these approaches, we consider the learned controller as a black box, thus our 
approach is particularly suitable for machine learning non-experts. 

While MDPs and partially-observable MDPs have been widely studied in the liter- 
ature w.r.t. to quantitative objectives [27,43], our framework requires the interaction of 
two players (the shield and the black-box controller) and we use game-theoretic frame- 
work with quantitative objectives for our solution. 


2 Definitions and Problem Statement 


2.1 Plants, Controllers, and Shields 


The interaction with a plant over a concrete set of states C is carried out 
using two functionalities: PLANT.GETSTATE returns the plant’s current state and 
PLANT.ISSUECOMMAND issues an action from a set J’. Once an action is issued, the 
plant updates its state according to some unknown transition function. At each point 
in time, the controller reads the state of the plant and issues a command. Thus, it is a 
function from a history in (C x I°)*-C to F. 

Informally, a shield serves as a proxy between the controller and the plant. In each 
time point, it reads the controller’s issued action and can choose an alternative action to 
issue to the plant. We are interested in light-weight shields that add little or no overhead 
to the controller, thus the shield must be defined w.r.t. an abstraction of the plant, which 
we define formally below. 


Abstraction. An abstraction is a Markov decision process (MDP, for short) is A = 
(T, A, ao, 6), where I’ is a set of actions, A is a set of abstract plant states, ag € A is an 
initial state, and ô : Ax I — (0, 1/4 is a probabilistic transition function, i.e., for every 
a € Aand y € T, we have >) „c 4 6(a, y)(a’) = 1. The probabilities in the abstraction 
model our lack of knowledge of the plant, and we assume that they reflect the behavior 
exhibited by the plant. A policy f is a function from a finite history of states in A* to 
the next action in I’, thus it gives rise to a probabilistic distribution D(f) over infinite 
sequences over A. 


Example 1. Consider a plant that represents a junction with four incoming directions 
(see Fig. 1). We describe an abstraction A for the junction that specifies how many cars 
are waiting in each queue, where we cut off the count at a parameter k € IN. Formally, 
an abstract state is a vector in {0,..., k}4, where the indices respectively represent the 
North, East, South, and West queues. The larger k is, the closer the abstraction is to 
the concrete plant. The set of possible actions represent the possible light directions 
in the junction {NS, EW}. The abstract transitions estimate the plant behavior, and 
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we describe them in two steps. Consider an abstract state a = (a1, a2,a3,a4) and 
suppose the issued action is NS, where the case of EW is similar. We allow a car to cross 
the junction from each of the North and South queues and decrease the two queues. 
Let a’ = (max{0, a1 — 1}, a2, max{0, a3 — 1}, a4). Next, we probabilistically model 
incoming cars to the queues as follows. Consider a vector (i4, i2, i3, i4) € {0,1}* that 
represents incoming cars to the queues. Let a” be such that, for 1 < j < 4, we add i; to 
the j-th queue and trim at k, thus a} = min{a’; + ij, k}. Then, in A, when performing 
action NS in a, we move to a” with the uniform probability 1/16. 


We define shields formally. Let J’ be a set of commands, M a set of memory states, 
C and A be a set of concrete and abstract states, respectively, and let a : C — A be 
a mapping between the two. A shield is a function SHIELD: Ax Mx I> Ix M 
together with an initial memory state mo € M. We use PLANT to refer to the plant, 
which, recall, has two functionalities: reading the current state and issuing a command 
from I”. Let CONT be a controller, which has a single functionality: given a history of 
plant states, the controller issues the command to issue to the plant. The interaction of 
the components is captured in the following pseudo code: 


m — mo E€ M and 7 — empty sequence. 

while true do 
c — PLANT.GETSTATE() € C 
y — CONT.GETCOMMAND(z : c) 
a=a(c)e€ A // generate abstract state for shield 
y'm’ — SHIELD(a, y, M) 
PLANT.ISSUECOMMAND(7’) 


m—m // update shield memory state 
Tm T: (c y) // update plant history 
end while 


2.2 Quantitative Objectives for Shields 


We are interested in two types of performance measures for shields. The behavioral 
measure quantifies the quality of the plant’s behavior when operated with a controller 
and shield. The interference measure quantifies the degree to which a shield interferes 
with the controller. Formally, we need to specify values for infinite sequences, and we 
use weighted automata, which are a convenient model to express such values. 


Weighted Automata. A weighted automaton is a function from infinite strings to val- 
ues. Technically, a weighted automaton is similar to a standard automaton only that the 
transitions are labeled, in addition to letters, with numbers (weights). Unlike standard 
automata in which a run is either accepting or rejecting, a run in a weighted automaton 
has a value. We focus on limit-average automata in which the value is the limit aver- 
age of the running sum of weights that it traverses. Formally, a weighted automaton 
is W = (X, Q, qo, A, cost), where X is a finite alphabet, Q is a finite set of states, 
A C (Q x X x Q) is a deterministic transition relation, i.e., for every q E€ Q and 
o € X, there is at most one q’ € Q with A(q, c, q’), and cost : A — Q specifies costs 
for transitions. A run of W on an infinite word o = 01,02,...isT =10,11,-.. E QY 
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such that ro = qo and, fori > 1, we have A(ri—1,ci, ri). Note that W is deter- 
ministic so there is at most one run on every word. The value that W assigns to ø is 
E 1 n 

liminfn>oo z yg 4 COSt(ri—1, Ci, ri). 


Behavioral Score. A behavioral score measures the quality of the behavior that the 
plant exhibits. It is given by a weighed automaton over the alphabet A x T, thus it 
assigns real values to infinite sequences over A x I. In our experiments, we use a 
concrete behavioral score, which assigns values to infinite sequences over C x I’. We 
compare the performance of the plant with various controllers and shields w.r.t. the 
concrete score rather than the abstract score. With a weighted automaton we can express 
costs that change over time: for example, we can penalize traffic lights that change 
frequently. 


Interference Score. The second score we consider measures the interference of the 
shield with the controller. An interference score is given by a weighted automaton over 
the alphabet I’ x I’. With a weighted automaton we can express costs that change over 
time: for example, interfering once costs 1 and any successive interference costs 2, thus 
we reward the shield for short interferences. 


From Shields and Controllers to Policies. Consider an abstraction MDP A. To ensure 
worst-case guarantees, we treat the controller as an adversary for the shield. Let SHIELD 
be a shield with memory set M and initial memory state mo. Intuitively, we find a policy 
in A that represents the interaction of SHIELD with a controller that maximizes the cost 
incurred. Formally, an abstract controller is a function x : A* — T. The interaction 
between SHIELD and y gives rise to a policy pol(SHIELD, x) in A, which, recall, is a 
function from A* to I’. We define pol(SHIELD, x) inductively as follows. Consider a 
history m € A* that ends in a € A, and suppose the current memory state of SHIELD is 
m E€ M. Let y = x(a) and let (y’,m’) = SHIELD(y, a, m). Then, the action that the 
policy pol(SHIELD, x) assigns is y’, and we update the memory state to be m’. 


Problem Definition; Quantitative Shield Synthesis Consider an abstraction MDP A, 
a behavioral score BEH, an interference score INT, both given as weighted automata, 
and a factor A € [0,1] with which we weigh the two scores. Our goal is to find 
an optimal shield w.r.t. these inputs as we define below. Consider a shield SHIELD 
with memory set M. Let X be the set of abstract controllers. For SHIELD and 
x € X, let D(SHIELD, x) be the probability distribution over A x I x T that the 
policy pol(SHIELD, X) gives rise to. The value of SHIELD, denoted val(SHIELD), is 
supyex E,~D(smetp,x) [À < INT(r) + (1 — A) - BEH(r)]. An optimal shield is a shield 
whose value is infsmerp val (SHIELD). 


Remark I (Robustness and flexibility). The problem definition we consider allows 
quantitative optimization of shields w.r.t. two dimensions of quantitative measures. Ear- 
lier works have considered shields but mainly with respect to Boolean measures in both 
dimensions. For example, in [30], shields for safety behavioral measures were con- 
structed with a Boolean notion of interference, as well as a Boolean notion of shield cor- 
rectness. In contrast we allow quantitative objectives in both dimensions which presents 
a much more general and robust framework. For example, the first measure of correct- 
ness can be quantitative and minimize the error rate, and the second measure can allow 
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shields to correct but minimize the long-run average interference. Both of the above 
allows the shield to be flexible. Moreover, tuning the parameter A allows flexible trade- 
off between the two. 

We allow a robust class of quantitative specifications using weighted automata, 
which have been already established as a robust specification framework. Any automata 
model can be used in the framework, not necessarily the ones we use here. For example, 
weighted automata that discount the future or process only finite-words are suitable for 
planning purposes [32]. Thus our framework is a very robust and flexible framework 
for quantitative shield synthesis. 


2.3 Examples 


In Remark 1 we already discussed the flexibility of the framework. We now present 
concrete examples of instantiations of the optimization problem above on our running 
example, which illustrate how quantitative shields can be used to cope with limitations 
of learned controllers. 


Dealing with Unexpected Plant Behavior; Rush-Hour Traffic. Consider the abstrac- 
tion described in Example 1, where each abstract state is a 4-dimensional vector that 
represents the number of waiting cars in each direction. The behavioral score we 
use is called the max queue. It charges an abstract state a € {0,...,k}4 with the 
size of the maximal queue, no matter what the issued action is, thus costgey(a) = 
MaXj¢{1,2,3,4} ai. A shield that minimizes the max-queue cost will prioritize the direc- 
tion with the largest queue. For the interference score, we use a score that we call the 
basic interference score; we charge the shield 1 whenever it changes the controller’s 
action and otherwise we charge it 0, and take the long-run average of the costs. Recall 
that in the construction in Example 1, we chose uniformly at random the vector of 
incoming cars. Here, in order to model rush-hour traffic, we use a different distribution, 
where we let p; be the probability that a car enters the j-th queue. Then, the probability 
of a vector (i1, 42,43, i4) € {0, 1}* is [Ty <j<4(py ` ij + (1 — pj): (1 — i;j)). To model 
a higher load traveling on the North-South route, we increase pı and p3 beyond 0.5. 


Weighing Different Goals; Local Fairness. Suppose the controller is trained to max- 
imize the number of cars passing a city. Thus, it aims to maximize the speed of the 
cars in the city and prioritizes highways over farm roads. A secondary objective for a 
controller is to minimize local queues. Rather than adding this objective in the training 
phase, which can have an un-expected outcome, we can add a local shield for each junc- 
tion. To synthesize the shield, we use the same abstraction and basic interference score 
as in the above. The behavioral score we use charges an abstract state a € {0,...,k}* 
with difference |(a; + a3) — (a2 + a4)|, thus the greater the inequality between the two 
waiting directions, the higher the cost. 


Adding Features to the Controller; Prioritizing Public Transport. Suppose a con- 
troller is trained to increase throughput in a junction. After the controller is trained, a 
designer wants to add a functionality to the controller that prioritizes buses over per- 
sonal vehicles. That is, if a bus is waiting in the North direction, and no bus is waiting 
in either the East or West directions, then the light should be North-South, and the other 
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cases are similar. The abstraction we use is simpler than the ones above since we only 
differentiate between a case in which a bus is present or not, thus the abstract states are 
{0,1}*, where the indices represent the directions clockwise starting from North. Let 
y = NS. The behavioral cost of a state a is 1 when ag = a4 = 0 anda, = 1 or ag = 1. 
The interference score we use is the basic one. A shield guarantees that in the long run, 
the specification is essentially never violated. 


3 A Game-Theoretic Approach to Quantitative Shield Synthesis 


In order to synthesize optimal shields we construct a two-player stochastic game [10], 
where we associate Player 2 with the shield and Player 1 with the controller. The game 
is defined on top of an abstraction and the players’ objectives are given by the two 
performance measures. We first formally define stochastic games, then we construct 
the shield synthesis game, and finally show how to extract a shield from a strategy for 
Player 2. 


Stochastic Graph Games. The game is played on a graph by placing a token on a 
vertex and letting the players move it throughout the graph. For ease of presentation, 
we fix the order in which the players move: first, Player 1, then Player 2, and then 
“Nature”, i.e., the next vertex is chosen randomly. Edges have costs, which, again for 
convenience, appear only on edges following Player 2 moves. Formally, a two-player 
stochastic graph-game is (Vi, V2, Vv, E, Pr, cost), where V = Vi; U V2 U Vy is a finite 
set of vertices that is partitioned into three sets, for ¿ € {1,2}, Player i controls the 
vertices in V; and “Nature” controls the vertices in Vy, Æ C (Vi x V2) U (V2 x Vn) 
is a set of deterministic edges, Pr : Vy x Vı — [0,1] is a probabilistic transition 
function, and cost : (V2 x Vy) — Q. Suppose the token reaches v € V. If vu € Vj, 
for i € {1,2}, then Player i chooses the next position of the token u € V, such that 
E(v,u). Ifv € Vy, then the next position is chosen randomly; namely, the token moves 
to u € V with probability Pr[v, u]. 

The game is a zero-sum game; Player 1 tries to maximize the expected long-run 
average of the accumulated costs, and Player 2 tries to minimize it. A strategy for 
Player i, for 2 € {1,2}, is a function that takes a history in V* - V; and returns the 
next vertex to move the token to. The games we consider admit memoryless optimal 
strategies, thus it suffices to define a Player 7 strategy as a function from V; to V. 
We associate a payoff with two strategies fı and f2, which we define next. Given 
fı and fo, it is not hard to construct a Markov chain M with states Vy and with 
weights on the edges: for v,u € Vy, the probability of moving from v to u in M 
is Prm[v, u] = X wevy:fo(fr(w))=u PTV, w] and the cost of the edge is cost y(v, u) = 
weVs:fo(fi(w)) =u PT, w] -cost(fi(w), u). The stationary distribution sy of a vertex 
v € Vy in M is a well known concept [43] and it intuitively measures the long-run 
average time that is spend in v. The payoff w.r.t. fı and fo, denoted payoff(f1, f2) is 
Xo ucvy Sv ` Pru, u] - costm(v, u). The payoff of a strategy is the payoff it guar- 
antees against any strategy of the other player, thus payoff( f1) = inf, payoff( f1, f2). 
A strategy is optimal for Player 1 if it achieves the optimal payoff, thus f is optimal if 
payoff( f) = sup, payoff( f1). The definitions for Player 2 are dual. 
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Constructing the Synthesis Game. Consider an abstraction MDP A = (T, A, ao, ô), 
weighted automata for the behavioral score BEH = (AX I, Qgen, q)™™, ABen, COStBEn) 
and interference score INT = (I x T, Qiyr, GX", Arr, Costinr), and a factor A € [0, 1]. 
We associate Player 1 with the controller and Player 2 with the shield. In each step, the 
controller first chooses an action, then the shield chooses whether to alter it, and the 
next state is selected at random. Let S = A x Qinr X Qpen. We define G4 penint,\ = 
(Vi, V2, Vy, E, Pr, cost), where 


-V =S, 

-VW=SxT, 

— Vy = S xT x {N}, where the purpose of N is to differentiate between the vertices, 

> E(s, (s,7)) 
for s € Sandy € T, and E((s,7¥), (s’,7, N)) for s = (a,q1,q2) E€ S, y, Y € 
I, and s= (a, qi, q2) E Js. Anr(41, (T) qi) and Apen(qQ, (a, y’), q2)» 

~ Pri((a, q1, q2); V; N}, (a’, M1; 92)] Z d(a,7)(a’), and 

— for s = (a,qı,q2) and s’ = (a,q\,q4) as in the above, we have cost((s,7), 
(s!,/, NY) = À- costine(an, (75); 4) + (1 — A) - costpen (an, (Y,a), ab). 


From Strategies to Shields. Recall that the game G.4 grunt, admits memoryless 
optimal strategies. Consider an optimal memoryless strategy f for Player 2. Thus, 
given a Player 2 vertex in V2, the function f returns a vertex in Vy to move to. The 
shield SHIELD that is associated with f has the memory set M = Qtnr X Qpen 
and the initial memory state is (qj\", g@™"). Given an abstract state a € A, a mem- 
ory state (qnt, Qpen) E M, and a controller action y € I, let (a, dinr: pen, Y) = 
f(a, qint, qBen, Y). The shield SHIELD returns the action 7 and the updated memory 


state (dinr: 4Ben)- 


Theorem 1. Given an abstraction A, weighted automata BEH and INT, and a factor 
A, the game GA Ben inr,à admits optimal memoryless strategies. Let f be an optimal 
memoryless strategy for Player 2. The shield SHIELD s is an optimal shield w.r.t. A, 
BEH, INT, and À. 


Remark 2 (Shield size). Recall that a shield is a function SHIELD : A x I x M —> 
I’ x M, which we store as a table. The size of the shield is the size of the domain, 
namely the number of entries in the table. Given an abstraction with n; states, a set of 
possible commands I’, and weighted automata with no and ng states, the size of the 
shield we construct is n4 -n2-ng-|I'|. 


Remark 3. Our construction of the game can be seen as a two-step procedure: we con- 
struct a stochastic game with two mean-payoff objectives, a.k.a. a two-dimensional 
game, where the shield player’s goal is to minimize both the behavioral and inter- 
ference scores separately. We then reduce the game to a “one-dimension” game by 
weighing the scores with the parameter A. We perform this reduction for several rea- 
sons. First, while multi-dimensional quantitative objectives have been studied in several 
cases, such as MDPs [4,6,7] and special problems of stochastic games (e.g., almost- 
sure winning) [2,5,8], there is no general algorithmic solution known for stochastic 
games with two-dimensional objectives. Second, even for non-stochastic games with 
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two-dimensional quantitative objectives, infinite-memory is required in general [48]. 
Finally, in our setting, the parameter À provides a meaningful tradeoff: it can be asso- 
ciated with how well we value the quality of the controller. If the controller is of poor 
quality, then we charge the shield less for interference and set À to be low. On the other 
hand, for a high-quality controller, we charge the shield more for interferences and set 
a high value for À. 


4 Case Study 


We experiment with our framework in designing quantitative shields for traffic-light 
controllers that are trained using reinforcement-learning (RL). We illustrate the use- 
fulness of shields in dealing with limitations of RL as well as providing an intuitive 
framework to complement RL techniques. 


Traffic Simulation. All experiments were conducted using traffic simulator “Simula- 
tion of Urban MObility” (SUMO, for short) [31] v0.22 using the SUMO Python API. 
Incoming traffic in the cities is chosen randomly. The simulations were executed on a 
desktop computer with a 4 x 2.70 GHz Intel Core i7-7500U CPU, 7.7 GB of RAM 
running Ubuntu 16.04. 


The Traffic Light Controller. We use RL to train a city-wide traffic-signal controller. 
Intuitively, the controller is aware of the waiting cars in each junction and its actions 
constitute a light assignment to all the junctions. We train a controller using a deep 
convolutional Q-network [37]. In most of the networks we test with, there are two 
controlled junctions. The input vector to the neural network is a 16-dimensional vec- 
tor, where 8 dimensions represent a junction. For each junction, the first four compo- 
nents state the number of cars approaching the junction and the last four components 
state the accumulated waiting time of the cars in each one of the lanes. For exam- 
ple, in Fig. 1, the first four components are (3,6,3,1), thus the controller’s state is 
not trimmed at 5. The controller is trained to minimize both the number of cars wait- 
ing in the queues and the total waiting time. For each junction i, the controller can 
choose to set the light to be either NS; or EW,, thus the set of possible actions is 
I’ = {NSiNS2, EW, NS2, NS1EW2, EW, EW2}. 

We use a network consisting of 4 layers: The input layer is a convolutional layer 
with 16 nodes, the first hidden and the second hidden layers consisting out of 604 nodes 
and 1166 nodes, respectively. The output layer consists of 4 neurons with linear activa- 
tion functions, each representing one of the above mentioned actions listed in I’. The 
Q-learning uses the learning rate œ = 0.001 and the discount factor 0.95 for the Q- 
update and an e-greedy exploration policy. The artificial neural network is built on an 
open source implementation! using Keras [9] and additional optimized functionality 
was provided by the NumPy [40] library. We train for 100 training epochs, where each 
epoch is 1500 seconds of simulated traffic, plus 2000 additional seconds in which no 
new cars are introduced. The total training time of the agent is roughly 1.5 hours. While 
the RL procedure that we use is simple procedure, it is inspired by standard approaches 


! https://github.com/Wert1996/Traffic- Optimisation. 
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to learning traffic controllers and produces controllers that perform relatively well also 
with no shield. 


The Shield. We synthesize a “local” shield for a junction and copy the shield for each 
junction in the city. Recall that the first step in constructing the synthesis game is to con- 
struct an abstraction of the plant, which intuitively represents the information according 
to which the shield makes its decisions. The abstraction we use is described in Exam- 
ple 1; each state is a 4-dimensional integer in {0,...,k}, which represents an abstrac- 
tion of the number of waiting cars in each direction, cut-off by k € IN. As elaborated in 
the example, when a shield assigns a green light to a direction, we evict a car from the 
two respectable queues, and select the incoming cars uniformly at random. Regarding 
objectives, in most of our experiments, the behavioral score we use charges an abstract 
state a € {0,...,k}* with |(a1 + a3) — (a2 + a4)|, thus the shield aims to balance the 
total number of waiting cars per direction. The interference score we use charges the 
shield 1 for altering the controller’s action. 

Since we use simple automata for objectives, the size of the shields we use is | Ax T|, 
where |I| = 2. In our experiments, we cut-off the queues at k = 6, which results in a 
shield of size 2592. The synthesis procedure’s running time is in the order of minutes. 
We have already pointed out that we are interested in small light-weight shields, and 
this is indeed what we construct. In terms of absolute size, the shield takes ~60 KB 
versus the controller who takes ~3 MB; a difference of 2 orders of magnitude. 

Our synthesis procedure includes a solution to a stochastic mean-payoff game. 
The complexity of solving such games is an interesting combinatorial problem in NP 
and coNP (thus unlikely to be NP-hard) for which the existence of a polynomial-time 
algorithm is major long-standing open problem. The current best-known algorithms 
are exponential, and even for special cases like turn-based deterministic mean-payoff 
games or turn-based stochastic games with reachability objectives, no polynomial-time 
algorithms are known. The algorithm we implemented is called the strategy iteration 
algorithm [22,23] in which one starts with a strategy and iteratively improves it, where 
each iteration requires polynomial time. While the algorithm’s worst-case complexity 
is exponential, in practice, the algorithm has been widely observed to terminate in a few 
number of iterations. 


Evaluating Performance. Throughout all our experiments, we use a unified and con- 
crete measure of performance: the total waiting time of the cars in the city. Our assump- 
tion is that minimizing this measure is the main objective of the designer of the traffic 
light system for the city. While performance is part of the objective function when train- 
ing the controller, the other components of the objective are used in order to improve 
training. Similarly, the behavioral measure we use when synthesizing shields is chosen 
heuristically in order to construct shields that improve concrete performance. 


The Effect of Changing A. Recall that we use À € [0, 1] in order to weigh between 
the behavioral and interference measures of a shield, where the larger À is, the more the 
shield is charged for interference. In our first set of experiments, we fix all parameters 
apart from À and synthesize shields for a city that has two controllable junctions. In the 
first experiment, we use a random traffic flow that is similar to the one used in training. 
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Fig. 2. Results for shields constructed with various À values, together with a fixed plant and 
controller, where the simulation traffic distribution matches the one the controller is trained for. 


We depict the results of the simulation in Fig. 2. We make several observations on the 
results below. 

Interference. We observe that the ratio of the time that the shield intervenes is low: for 
most values of À the ratio is well below 10%. For large values of A, interference is too 
costly, and the shields become trivial, namely it never alters the actions of the controller. 
The performance we observe is thus the performance of the controller with no shield. In 
this set of experiments, we observe that the threshold after which shields become trivial 
is \ = 0.5, and for different setups, the threshold changes. 


Performance. We observe that performance as function of A, is a curve-like function. 
When A is small, altering commands is cheap, the shield intervenes more frequently, 
and performance drops. This performance drop is expected: the shield is a simple device 
and the quality of its routing decisions cannot compete with the trained controller. This 
drop is also encouraging since it illustrates that our experimental setting is interesting. 
Surprisingly, we observe that the curve is in fact a paraboloid: for some values, e.g., 
= 0.4, the shield improves the performance of the controller. We find it unexpected 
that the shield improves performance even when observing trained behavior, and this 
performance increase is observed more significantly in the next experiments. 


Rush-Hour Traffic. In Fig.3, we use a shield to add robustness to a controller for 
behavior it was not trained for. We see a more significant performance gain in this exper- 
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Fig. 3. Similar to Fig.2 only that the sim- Fig.4. Comparing the variability in performance 
ulation traffic distribution models rush-hour of the different controllers, with shield (blue) and 
traffic. without a shield (red). (Color figure online) 
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iment. We use the controller from the previous experiment, which is trained for uniform 
car arrival. We simulate it in a network with “rush-hour” traffic, which we model by sig- 
nificantly increasing the traffic load in the North-South direction. We synthesize shields 
that prefer to evict traffic from the North-South queue over the East-West queue. We 
achieve this by altering the objective in the stochastic game; we charge the shield a 
greater penalty for cars waiting in these queues over the other queues. For most values 
of below 0.7, we see a performance gain. Note that the performance of the controller 
with no shield is depicted on the far right, where the shield is trivial. An alternative 
approach to synthesize a shield would be to alter the probabilities in the abstraction, but 
we found that altering the weights results in a better performance gain. 


Reducing Variability. Machine learning techniques are intricate, require expertise, and 
a fine tuning of parameters. This set of experiments show how the use of shields reduces 
variability of the controllers, and as a result, it reduces the importance of choosing 
the optimal parameters in the training phase. We fix one of the shields from the first 
experiment with A = 0.4. We observe performance in a city with various controllers, 
which are trained with varying training parameters, when the controllers are run with 
and without the shield and on various traffic conditions that sometimes differ from the 
ones they are trained on. 

The city we experiment with consists of a main two-lane road that crosses the city 
from East to West. The main road has two junctions in which smaller “farm roads” 
meet the main road. We refer to the bulk traffic as the traffic that only “crosses the 
city”; namely, it flows only on the main road either from East to West or in the opposite 
direction. For r € [0, 1], Controller-r is trained where the ratio of the bulk traffic out of 
the total traffic is r. That is, the higher r is, the less traffic travels on the farm roads. We 
run simulations in which Controller-r observes bulk traffic k € [0,1], which it was not 
necessarily trained for. 
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Fig. 5. Results for Controllers-0.65 and 0.9 exhibiting traffic that they are not trained for, with 
and without a shield. Performance is the total waiting time of the cars in the city. 


In Fig. 5, we depict the performance of two controllers for various traffic settings. 
We observe, in these two controllers as well as the others, that operating with a shield 
consistently improves performance. The plots illustrate the unexpected behavior of 
machine-learning techniques: e.g., when run without a shield, Controller-0.9 outper- 
forms Controller-0.65 in all settings, even in the setting 0.65 on which Controller-0.65 
was trained on. Thus, a designer who expects a traffic flow of 0.65, would be better 
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off training with a traffic of 0.9. A shield improves performance and thus reduces the 
importance of which training data to use. 


Measuring Variability. In Fig.4, we depict the variability in performance between 
the controllers. The higher the variability is, the more significant it is to 
choose the right parameters when training the controller. Formally, let R = 
{0.65, 0.7, 0.75, 0.8, 0.85, 0.9}. For r, k € R, we let Perf(r, k) denote the performance 
(total waiting times) when Controller-r observes bulk traffic k. For each k € R, we 
depict max,<¢p Perf(r, k) — miner Perf(r’, k), when operating with and without a 
shield. 

Clearly, the variability with a shield is significantly lower than without one. This 
data shows that when operating with a shield, it does not make much difference if a 
designer trains a controller with setting r or r’. When operating without a shield, the 
difference is significant. 


Overcoming Liveness Bugs. Finding bugs in learned controllers is a challenging task. 
Shields bypass the need to find bugs since they treat the controller as a black-box and 
correct its behavior. We illustrate their usefulness in dealing with liveness bugs. In the 
same network as in the previous setting, we experiment with a controller whose train- 
ing process lacked variability. In Fig.6, we depict the light configuration throughout 
the experiment on the main road; the horizontal axis represents time, red means a red 
light for the main road and dually green. Initially, the controller performs well, but 
roughly half-way through the simulation it hits a bad state after which the light stays 
red. The shield, with only a few interferences, which are represented with dots, manages 
to recover the controller from its stuck state. In Fig. 7, we depict the number of waiting 
cars in the city, which clearly skyrockets once the controller gets stuck. It is evident that 
initially, the controller performs well. This point highlights that it is difficult to recog- 


Fig. 6. The light in the East-West direction (the main road) of a junction. On bottom, with no 
shield the controller is stuck. On top, the shield’s interferences are marked with dots. 
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Fig. 7. The total number of waiting cars (log-scale) with and without a shield. Initially, the con- 
troller performs well on its own, until it gets stuck and traffic in the city freezes. 
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nize when a controller has a bug — in order to catch such a bug, a designer would need 
to find the right simulation and run it half way through before the bug appears. 

One way to regain liveness would be to synthesize a shield for the qualitative prop- 
erty “each direction eventually gets a green light”. Instead, we use a shield that is syn- 
thesized for the quantitative specification as in the previous experiment. The shield, 
with a total of only 20 alterations is able to recover the controller from the bad state it 
is stuck in, and traffic flows correctly. 


Adding Functionality; Prioritizing Public Transport. Learned controllers are mono- 
lithic. Adding functionality to a controller requires a complete re-training, which is 
time consuming, computationally costly, and requires care; changes in the objective 
can cause unexpected side effects to the performance. We illustrate how, using a shield, 
we can add to an existing controller, the functionality of prioritizing public transport. 

The abstraction over which the shield is constructed slightly differs from the one 
used in the other experiments. The abstract state space is the same, namely four- 
dimensional vectors, though we interpret the entries as the positions of a bus in the 
respective queue. For example, the state (0,3, 0, 1) represents no bus in the North queue 
and a bus which is waiting, third in line, in the East queue. Outgoing edges from an 
abstract state also differ as they take into account, using probability, that vehicles might 
enter the queues between buses. For the behavioral score, we charge an abstract state 
with the sum of its entries, thus the shield is charged whenever buses are waiting and it 
aims to evict them from the queues as soon as possible. 

In Fig. 8, we depict the performance of all vehicles and only buses as a function of 
the weighing factor À. The result of this experiment is positive; the predicted behavior 
is observed. Indeed, when A is small, interferences are cheap, which increase bus per- 
formance at the expense of the general performance. The experiment illustrates that the 
parameter À is a convenient method to control the degree of prioritization of buses. 


Local Fairness. In this experiment, we add local fairness to a controller that was trained 
for a global objective. We experiment with a network with four junctions and a city-wide 
controller, which aims to minimize total waiting times. Figure 9 shows that when the 
controller is deployed on its own, queues form in the city whereas a shield, which was 
synthesized as in the first experiments, prevents such local queues from forming. 
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Fig. 8. The waiting time of buses/all vehicles Fig.9. Comparing the amount of waiting 
with shields parameterized by A. cars with and without a shield. 
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5 Discussion and Future Work 


We suggest a framework for automatically synthesizing quantitative runtime shields 
to cope with limitations of machine-learning techniques. We show how shields can 
increase robustness to untrained behavior, deal with liveness bugs without verification, 
add features without retraining, and decrease variability of performance due to changes 
in the training parameters, which is especially helpful for machine learning non-experts. 
We use weighted automata to evaluate controller and shield behavior and construct a 
game whose solution is an optimal shield w.r.t. a weighted specification and a plant 
abstraction. The framework is robust and can be applied in any setting where learned or 
other black-box controllers are used. 

We list several directions for further research. In this work, we make no assump- 
tions on the controller and treat it adversarially. Since the controller might have bugs, 
modelling it as adversarial is reasonable. Though, it is also a crude abstraction since typ- 
ically, the objectives of the controller and shield are similar. For future work, we plan 
to study ways to model the spectrum between cooperative and adversarial controllers 
together with solution concepts for the games that they give rise to. 

In this work we make no assumptions on the relationship between the plant and the 
abstraction. While the constructed shields are optimal w.r.t. the given abstraction, the 
scores they guarantee w.r.t. the abstraction do not imply performance guarantees on the 
plant. To be able to produce performance guarantees on the concrete plant, we need 
guarantees on the relationship between the plant its abstraction. For future work, we 
plan to study the addition of such guarantees and how they affect the quality measures. 
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Abstract. Delayed coupling between state variables occurs regularly in tech- 
nical dynamical systems, especially embedded control. As it consequently is 
omnipresent in safety-critical domains, there is an increasing interest in the safety 
verification of systems modelled by Delay Differential Equations (DDEs). In 
this paper, we leverage qualitative guarantees for the existence of an exponen- 
tially decreasing estimation on the solutions to DDEs as established in classical 
stability theory, and present a quantitative method for constructing such delay- 
dependent estimations, thereby facilitating a reduction of the verification prob- 
lem over an unbounded temporal horizon to a bounded one. Our technique builds 
on the linearization technique of nonlinear dynamics and spectral analysis of the 
linearized counterparts. We show experimentally on a set of representative bench- 
marks from the literature that our technique indeed extends the scope of bounded 
verification techniques to unbounded verification tasks. Moreover, our technique 
is easy to implement and can be combined with any automatic tool dedicated to 
bounded verification of DDEs. 


Keywords: Unbounded verification - 
Delay Differential Equations (DDEs) - Safety and stability - Linearization + 
Spectral analysis 


1 Introduction 


The theory of dynamical systems featuring delayed coupling between state variables 
dates back to the 1920s, when Volterra [41,42], in his research on predator-prey mod- 
els and viscoelasticity, formulated some rather general differential equations incor- 
porating the past states of the system. This formulation, now known as delay differ- 
ential equations (DDEs), was developed further by, e.g., Mishkis [30] and Bellman 
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and Cooke [2], and has witnessed numerous applications in many domains. Prominent 
examples include population dynamics [25], where birth rate follows changes in popu- 
lation size with a delay related to reproductive age; spreading of infectious diseases [5], 
where delay is induced by the incubation period; or networked control systems [21] with 
their associated transport delays when forwarding data through the communication net- 
work. These applications range further to models in optics [23], economics [38], and 
ecology [13], to name just a few. Albeit resulting in more accurate models, the presence 
of time delays in feedback dynamics often induces considerable extra complexity when 
one attempts to design or even verify such dynamical systems. This stems from the fact 
that the presence of feedback delays reduces controllability due to the impossibility of 
immediate reaction and enhances the likelihood of transient overshoot or even oscilla- 
tion in the feedback system, thus violating safety or stability certificates obtained on 
idealized, delay-free models of systems prone to delayed coupling. 

Though established automated methods addressing ordinary differential equations 
(ODEs) and their derived models, like hybrid automata, have been extensively studied in 
the verification literature, techniques pertaining to ODEs do not generalize straightfor- 
wardly to delayed dynamical systems described by DDEs. The reason is that the future 
evolution of a DDE is no longer governed by the current state instant only, but depends 
on a chunk of its historical trajectory, such that introducing even a single constant delay 
immediately renders a system with finite-dimensional states into an infinite-dimensional 
dynamical system. There are approximation methods, say the Padé approximation [39], 
that approximate DDEs with finite-dimensional models, which however may hide fun- 
damental behaviors, e.g. (in-)stability, of the original delayed dynamics, as remarked 
in Sect. 5.2.2.8.1 of [26]. Consequently, despite well-developed numerical methods for 
solving DDEs as well as methods for stability analysis in the realm of control theory, 
hitherto in automatic verification, only a few approaches address the effects of delays 
due to the immediate impact of delays on the structure of the state spaces to be traversed 
by state-exploratory methods. 

In this paper, we present a constructive approach dedicated to verifying safety prop- 
erties of delayed dynamical systems encoded by DDEs, where the safety properties 
pertain to an infinite time domain. This problem is of particular interests when one 
pursues correctness guarantees concerning dynamics of safety-critical systems over a 
long run. Our approach builds on the linearization technique of potentially nonlinear 
dynamics and spectral analysis of the linearized counterparts. We leverage qualitative 
guarantees for the existence of an exponentially decreasing estimation on the solutions 
to DDEs as established in classical stability theory (see, e.g., [2,19,24]), and present 
a quantitative method to construct such estimations, thereby reducing the temporally 
unbounded verification problems to their bounded counterparts. 

The class of systems we consider features delayed differential dynamics governed 
by DDEs of the form x(t) = f (x(t),x(t—711),...,x(t—r,)) with initial states 
specified by a continuous function @ (t) on [—Tmax, 0] where rmax = max{ri,..., Tk}. 
It thus involves a combination of ODE and DDE with multiple constant delays r; > 0, 
and has been successfully used to model various real-world systems in the aforemen- 
tioned fields. In general, formal verification of unbounded safety or, dually, reachability 
properties of such systems inherits undecidability from similar properties for ODEs 
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(cf. e.g., [14]). We therefore tackle this unbounded verification problem by leveraging 
a stability criterion of the system under investigation. 


Contributions. In this paper, we present a quantitative method for constructing a delay- 
dependent, exponentially decreasing upper bound, if existent, that encloses trajecto- 
ries of a DDE originating from a certain set of initial functions. This method conse- 
quently yields a temporal bound T* such that for any T > T*, the system is safe over 
[—Tmax, 7] iff it is safe over [—rmax, o0). For linear dynamics, such an equivalence of 
safety applies to any initial set of functions drawn from a compact subspace in R”; 
while for nonlinear dynamics, our approach produces (a subset of) the basin of attrac- 
tion around a steady state, and therefore a certificate (by bounded verification in finitely 
many steps) that guarantees the reachable set being contained in this basin suffices to 
claim safety/unsafety of the system over an infinite time horizon. Our technique is easy 
to implement and can be combined with any automatic tool for bounded verification of 
DDEs. We show experimentally on a set of representative benchmarks from the litera- 
ture that our technique effectively extends the scope of bounded verification techniques 
to unbounded verification tasks. 


Related Work. As surveyed in [14], the research community has over the past three 
decades vividly addressed automatic verification of hybrid discrete-continuous systems 
in a safety-critical context. The almost universal undecidability of the unbounded reach- 
ability problem, however, confines the sound key-press routines to either semi-decision 
procedures or even approximation schemes, most of which address bounded verification 
by computing the finite-time image of a set of initial states. It should be obvious that 
the functional rather than state-based nature of the initial condition of DDEs prevents a 
straightforward generalization of this approach. 

Prompted by actual engineering problems, the interest in safety verification of con- 
tinuous or hybrid systems featuring delayed coupling is increasing recently. We classify 
these contributions into two tracks. The first track pursues propagation-based bounded 
verification: Huang et al. presented in [21] a technique for simulation-based time- 
bounded invariant verification of nonlinear networked dynamical systems with delayed 
interconnections, by computing bounds on the sensitivity of trajectories to changes in 
initial states and inputs of the system. A method adopting the paradigm of verification- 
by-simulation (see, e.g., [9, 16,31]) was proposed in [4], which integrates rigorous error 
analysis of the numeric solving and the sensitivity-related state bloating algorithms 
(cf. [7]) to obtain safe enclosures of time-bounded reachable sets for systems mod- 
elled by DDEs. In [46], the authors identified a class of DDEs featuring a local homeo- 
morphism property which facilitates construction of over- and under-approximations of 
reachable sets by performing reachability analysis on the boundaries of the initial sets. 
Goubault et al. presented in [17] a scheme to compute inner- and outer-approximating 
flowpipes for DDEs with uncertain initial states and parameters using Taylor models 
combined with space abstraction in the shape of zonotopes. The other track of the lit- 
erature tackles unbounded reachability problem of DDEs by taking into account the 
asymptotic behavior of the dynamics under investigation, captured by, e.g., Lyapunov 
functions in [32,47] and barrier certificates in [35]. These approaches however share a 
common limitation that a polynomial template has to be specified either for the interval 
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Taylor models exploited in [47] (and its extension [29] to cater for properties specified 
as bounded metric interval temporal logic (MITL) formulae), for Lyapunov functionals 
in [32], or for barrier certificates in [35]. Our approach drops this limitation by resorting 
to the linearization technique followed by spectral analysis of the linearized counter- 
parts, and furthermore extends over [47] by allowing immediate feedback (i.e. x(t)) as 
well as multiple delays in the dynamics), to which their technique does not generalize 
immediately. In contrast to the absolute stability exploited in [32], namely a criterion 
that ensures stability for arbitrarily large delays, we give the construction of a delay- 
dependent stability certificate thereby substantially increasing the scope of dynamics 
amenable to stability criteria, for instance, the famous Wright’s equation (cf. [44]). 
Finally, we refer the readers to [34] and [33] for related contributions in showing the 
existence of abstract symbolic models for nonlinear control systems with time-varying 
and unknown time-delay signals via approximate bisimulations. 


2 Problem Formulation 


Notations. Let N, R and C be the set of natural, real and complex numbers, respec- 
tively. Vectors will be denoted by boldface letters. For z = a+ ib € C with a,b € R, 
the real and imaginary parts of z are denoted respectively by R(z) = a and 3(z) = b; 
|z| = Va? + b? is the modulus of z. For a vector x € R”, x; refers to its i-th com- 
ponent, and its maximum norm is denoted by ||x|| = maxi<;<n |x;|. We define for 
ô > 0, B(x, ô) = {x’ € R” | |x’ —x|| < 5} as the -closed ball around x. The 
notation ||-|| extends to a set X C R” as ||X|| = supyex ||x||, and to an m x n 
complex-valued matrix A as ||A|| = maxj<j<m peer |a;;|. X is the closure of X 
and OX denotes the boundary of X. For a < b, let C?([a, b], R”) denote the space 
of continuous functions from [a,b] to R”, which is associated with the maximum 
norm || || = max¢efa,4 || f(¢)||. We abbreviate C°({—r,0],IR") as C, for a fixed pos- 
itive constant r, and let C! consist of all continuously differentiable functions. Given 
f: [0,00) — R a measurable function such that || f(¢)|| < ae’ for some constants a 
and b, then the Laplace transform £{ f } defined by L{ f}(z) = ie e *" f(t) dt exists 
and is an analytic function of z for R(z) > b. 


Delayed Differential Dynamics. We consider a class of dynamical systems featuring 
delayed differential dynamics governed by DDEs of autonomous type: 


ee ee t € [0, co) 
x(t)=@(t), te [-re,0] 


where x is the time-dependent state vector in R”, x denotes its temporal derivative 
dx/dt, and t is a real variable modelling time. The discrete delays are assumed to be 
ordered as rg > ... > rı > O, and the initial states are specified by a vector-valued 
function @ € Cr,- 

Suppose f is a Lipschitz-continuous vector-valued function in Ct (REDE R”), 
which implies that the system has a unique maximal solution (or trajectory) from a 
given initial condition @ € C,,, denoted as €4: [-rk, o0) +» R”. We denote in the 


(1) 
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sequel by fx = Ea ree ah the Jacobian matrix (i.e., matrix consisting of all first- 


order partial derivatives) of f w.r.t. the component x (t). Similar notations apply to 
components x (t — ri), fori =1,...,k. 


Example I (Gene regulation [12,36]). The control of gene expression in cells is often 
modelled with time delays in equations of the form 


(t) = g(tn(t — rn)) — Bix (t) 
e ais a a Dg l<j<n a 


where the gene is transcribed producing mRNA (x1), which is translated into enzyme 
xə that in turn produces another enzyme x3 and so on. The end product £n acts to 
repress the transcription of the gene by g < 0. Time delays are introduced to account 
for time involved in transcription, translation, and transport. The positive 3;’s represent 
decay rates of the species. The dynamic described in Eq. (2) falls exactly into the scope 
of systems considered in this paper, and in fact, it instantiates a more general family 
of systems known as monotone cyclic feedback systems (MCFS) [28], which includes 
neural networks, testosterone control, and many other effects in systems biology. 


Lyapunov Stability. Given a system of DDEs in Eq. (1), suppose f has a steady state 
(a.k.a., equilibrium) at Xe such that f (Xe, ..., Xe) = 0 then 


— Xe is said to be Lyapunov stable, if for every € > 0, there exists 6 > 0 such that, if 
|| — Xe|| < ô, then for every t > 0 we have ||Eg(t) — Xe|| < €. 

— Xe is said to be asymptotically stable, if it is Lyapunov stable and there exists 6 > 0 
such that, if ||P — xel| < ô, then lim;—o ||€g(t) — xe|| = 0. 

— Xe is said to be exponentially stable, if it is asymptotically stable and there exist 
a, 3,6 > 0 such that, if ||P — xe|| < ô, then |/€g(t) — xe] < a |b — xel| e~**, for 
all t > 0. The constant ( is called the rate of convergence. 


Here xe can be generalized to a constant function in C,, when employing the supre- 
mum norm ||@ — x¢|| over functions. This norm further yields the locality of the above 
definitions, i.e., they describe the behavior of a system near an equilibrium, rather than 
of all initial conditions @ € C,.,, in which case it is termed the global stability. W.1.0.g., 
we assume f(0,...,0) = 0 in the sequel and investigate the stability of the zero equi- 
librium thereof. Any nonzero equilibrium can be straightforwardly shifted to a zero one 
by coordinate transformation while preserving the stability properties, see e.g., [19]. 


Safety Verification Problem. Given X C R” a compact set of initial states and 
U C R” a set of unsafe or otherwise bad states, a delayed dynamical system of the 
form (1) is said to be T-safe iff all trajectories originating from any @(t) satisfying 
Q(t) E€ X,Vt € [—rx, 0] do not intersect with U at any t € [—rz, T], and T-unsafe oth- 
erwise. In particular, we distinguish unbounded verification with T = oo from bounded 
verification with T' < oo. 

In subsequent sections, we first present our approach to tackling the safety verifica- 
tion problem of delayed differential dynamics coupled with one single constant delay 
(i.e., k = 1 in Eq. (1)) in an unbounded time domain, by leveraging a quantitative 
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stability criterion, if existent, for the linearized counterpart of the potentially nonlinear 
dynamics in question. A natural extension of this approach to cater for dynamics with 
multiple delay terms will be remarked thereafter. In what follows, we start the elabo- 
ration of the method from DDEs of linear dynamics that admit spectral analysis, and 
move to nonlinear cases afterwards and show how the linearization technique can be 
exploited therein. 


3 Linear Dynamics 


Consider the linear sub-class of dynamics given in Eq. (1): 


x(t) = Ax(t)+ Bx(t—r), t€[0,«) (3) 
= it), t € [-r,0] 

where A,B € R"*”", ġ € Cr, and the system is associated with the characteristic 
equation 


det (zI — A — Be~"”) = 0, (4) 


where I is the nxn identity matrix. Denote by h(z) = z1— A— Be™™7 the characteristic 
matrix in the sequel. Notice that the characteristic equation can be obtained by seeking 
nontrivial solutions to Eq. (3) of the form g(t) = ce*’, where c is an n-dimensional 
nonzero constant vector. 

The roots À € C of Eq. (4) are called characteristic roots or eigenvalues and the set 
of all eigenvalues is referred to as the spectrum, denoted by o = {A | det (h(A)) = 0}. 
Due to the exponentiation in the characteristic equation, the DDE has, in line with 
its infinite-dimensional nature, infinitely many eigenvalues possibly, making a spectral 
analysis more involved. The spectrum does however enjoy some elementary properties 
that can be exploited in the analysis. For instance, the spectrum has no finite accumu- 
lation point in C and therefore for each positive y € R, the number of roots satisfying 
|A| < y is finite. It follows that the spectrum is a countable (albeit possibly infinite) set: 


Lemma 1 (Accumulation freedom [6,19]). Given y € R, there are at most finitely 
many characteristic roots satisfying R(X) > y. If there is a sequence {,,} of roots of 
Eq. (4) such that |A;,| > co as n —> œ, then R(An) > — œ as n > œ. 


Lemma | suggests that there are only a finite number of solutions in any vertical 
strip in the complex plane, and there thus exists an upper bound a € R such that every 
characteristic root A in the spectrum satisfies R(A) < a. This upper bound captures 
essentially the asymptotic behavior of the linear dynamics: 


Theorem 1 (Globally exponential stability [6,36]). Suppose R(A) < a for every 
characteristic root A. Then there exists K > 0 such that 


l€o(t)|| < K ||blle™, Yt > 0, Vb € Cr, (5) 


where &4(t) is the solution to Eq. (3). In particular, x = 0 is a globally exponentially 
stable equilibrium of Eq. (3) if R(X) < 0 for every characteristic root; it is unstable if 
there is a root satisfying R(X) > 0. 
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Theorem | establishes an existential guarantee that the solution to the linear delayed 
dynamics approaches the zero equilibrium exponentially for any initial conditions in 
C,.. To achieve automatic safety verification, however, we ought to find a constructive 
means of estimating the (signed) rate of convergence a and the coefficient K in Eq. (5). 
This motivates the introduction of the so-called fundamental solution y(t) to Eq. (3), 
whose Laplace transform will later be shown to be h~!(z), the inverse characteristic 
matrix, which always exists for z satisfying R(z) > maxyeo R(A). 


Lemma 2 (Variation-of-constants [19,36]). Let €4(t) be the solution to Eq. (3). 
Denote by &4)(t) the solution that satisfies Eq. (3) for t > 0 and satisfies a varia- 
tion of the initial condition as '(0) = I and $'(t) = O for all t € [—r, 0), where O is 
the n x n zero matrix, then for t > 0, 


Ealt) = Ey (t) P0) + f gpt- Borrar © 


Note that in Eq. (6), (t) is extended to [—r,0o) by making it zero for t > 0. In 
spite of the discontinuity of d’ at zero, the existence of the solution €4/(t) can be proven 
by the well-known method of steps [8]. 


Lemma 3 (Fundamental solution [19]). The solution €y:(t) to Eq. (3) with initial 
data @' is the fundamental solution; that is for z s.t. R(z) > maxyeg KA), 


LiEg }(z) = h (2). 


The fundamental solution €4/(t) can be proven to share the same exponential bound 
as that in Theorem 1, while the following theorem, as a consequence of Lemma 2, gives 
an exponential estimation of €4(t) in connection with €4/(t): 


Theorem 2 (Exponential estimation [36]). Denote by p = max)eo (A) the maxi- 
mum real part of eigenvalues in the spectrum. Then for any a > p, there exists K > 0 
such that 

lêg (Hl < Ke, Yt >0, (7) 


and hence by Eq. (6), \|€g(t)|| < K (1+ || Bll Jo e~°7 dr) ||@l| e°* for any t > 0 and 
@ € Cr. In particular, x = 0 is globally exponentially stable for Eq. (3) if u < 0. 


Following Theorem 2, an exponentially decreasing bound on the solution €4(t) to 
linear DDEs of the form (3) can be assembled by computing a satisfying y < a < 0 
and the coefficient K > 0. 


3.1 Identifying the Rightmost Roots 


Due to the significance of characteristic roots in the context of stability and bifurca- 
tion analysis, numerical methods on identifying—particularly the rightmost—roots of 
linear (or linearized) DDEs have been extensively studied in the past few decades, see 
e.g., [3,11,43,45]. There are indeed complete methods on isolating real roots of poly- 
nomial exponential functions, for instances [37] and [15] based on cylindrical algebraic 
decomposition (CAD). Nevertheless, as soon as non-trivial exponential functions arise 
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in the characteristic equation, there appear to be few, if any, symbolic approaches to 
detecting complex roots of the equation. 

In this paper, we find a that bounds the spectrum from the right of the complex 
plane, by resorting to the numerical approach developed in [11]. The computation 
therein employs discretization of the solution operator using linear multistep (LMS) 
methods to approximate eigenvalues of linear DDEs with multiple constant delays, 
under an absolute error of O (7?) for sufficiently small stepsize 7, where O (-) is the big 
Omicron notation and p depends on the order of the LMS-methods. A well-developed 
MATLAB package called DDE-BIFTOOL [10] is furthermore available to mechanize 
the computation, which will be demonstrated in our forthcoming examples. 


3.2 Constructing K 


By the inverse oie transform (cf. ee 5.2 in [19] for a detailed proof), we have 
£p (t) = limy z4 ie *th-1(z) dz for z satisfying A(z) > u, where a is the 
exponent associated with the bound on €4/(t) in Eq. (7), and hence by substituting 
z = a + iv, we have 


boy 
egg (t) = Jim, sf, eth (a+ iv) dv. 


Since h-'(z) = 44 (h-1(z) — 4) = £ + O(1/2’), together with the fact that an 
integral over a quadratic integrand is convergent, it follows that 


egg (t) = lim Lf. avs ef eto (_* __) aw 
PN yio 2n _y a+iv 27 Joo (a +iv)? i 


By taking the norm while observing that ei”? = 1, we get 


e7% |ë (t)|| < li 1 T eivt I d 1 a O 1 d 
im — e y || 4 Ty V. 
Me V>œ 2r Jy ativ 2r Jæ (a+ iv)? 
e——— mm eS 
(8-a) (8-b) 
(8) 
For the integral (8-a), the fact! that 
co iax co ix 2 —ab if b 
f e da= | e iea Te wa >0 (9) 
-œ O+ix œ abt+ix 0 ifa >0,b <0, 
implies 
1 fY 1, vt 
lim zl oirt e vt > 0, Va > 0 (10) 
Vo 2T J_y a+iv 0, Vt>0, Va <0. 


Notice that the second integral (8-b) is computable, since it is convergent and indepen- 
dent of t. The underlying computation of the improper integral, however, can be rather 
time-consuming. We therefore detour by computing an upper bound of (8-b) in the 
form of a definite integral, due to Lemma 4, which suffices to constitute an exponential 
estimation of €4/(t) while reducing computational efforts pertinent to the integration. 


' The integral in (9) is divergent for a = 0 or b = 0 in the sense of a Riemann integral. 
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Lemma 4. There exists M > 0 such that inequation (11) below holds for any a > p. 


[cle aren) ws f lean s)| a+ FF (AI + Be) 


(11) 
where u = mMaxyeg RA), z = a + iv, and n is the order of A and B. 


Proof. The proof depends essentially on constructing a threshold M > 0 such that 
the integral over |v| > M can be bounded, thus transforming the improper integral in 
question to a definite one. To find such an M, observe that 


ele- 


Without loss of generality, suppose the entry of h~+(z) at (i, j) takes the form 


Øl < ne al (IAI + IIB] e7"®). 


=|" r-* 


= (Sve =r) *)/ det(h = Swe ey 2) /(2” a 3 ax(e"*)z*) 


= - a pile "*)z k— ARL /(+ 3 qr(e et? ak w) 


k=0 


TZ —TZ 


where p (-) and qi(-) are polynomials in e~’* as coefficients of z”. Since e~"” is 
bounded by e~’® along the vertical line z = a + iv, we can conclude that there exist 
Př and Qx such that pe (e""*)| < Pe and |q,(e~"*)| < Qk, with P% ij i= lifi =j, 


and 0 otherwise. Furthermore, in the vertical line z = a + iv, if |v| > 1, then 


e"*) gkontl < pid_,(e7"*) 


n—2 p 7 n—2 . 
rE ee |< Pe Bee, 
k=0 k=0 


14 S gle "jz" 
k=0 


n-1 n—1 
21- Ð Jale) [24-7] 21- So Qe [274]. 
k=0 k=0 
n—-l1 n—2 if 
We can thus choose |v| > M= max {12 Qn, > ae which implies 
1Si,jgn k=0 k=0 


e-") 2") / det(h alz Öre -rzy gkomt1) 11 4 Fa] 


a Sec a-Y alps ALE 
where the third inequality holds since |v| > M. It then follows, if |v| > M, that 


4n 
72 


lo (tap) < Helias wate) < Sea + wale, 
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and thereby 


Ellek 


This completes the proof. 
Equations (8), (10) and (11) yield that e7™®* ||€¢ (t) || is upper-bounded by 


1 M 
x=] 
2T \J_u 


for all t > 0. Here M is the constant given in Lemma 4, while 1g: (p, 00) \ {0} > 
{0, 1} is an indicator function? of {a | a > 0}, i.e., 1o(a) = 1 fora > 0 and 1ọ(a) = 0 
fru<ax< 0. 

In contrast to the existential estimation guarantee established in Theorem 2, exploit- 
ing the construction of œ and K gives a constructive quantitative criterion permitting to 
reduce an unbounded safety verification problem to its bounded counterpart: 


1 X 4n 
d 2 A Bille") d 
O( ase) |e t2 [Sats tate) a 


O(a) + FAN + Be). 


o (ap) | oe a+ i a) + lo(a), (12) 


at+iv 


Theorem 3 (Equivalence of bounded and unbounded safety). Given X C R” a set 
of initial states and U C R” a set of bad states satisfying O ¢ U, suppose we have a 
satisfying u < a < 0 and K from Eq. (12). Let K = K (1+ ||B| e-°T dr) ||¥ 
then there exists T* < œ, defined as 


j 


T* = max{0,inf{T | Yt > T: [-Ke, Ke] nu = OF}, (13) 
such that for any T > T*, the system (3) is oo-safe iff it is T-safe. 


Proof. The “only if” part is for free, as oo-safety subsumes by definition T-safety. 
For the “if” direction, the constructed K in Eq. (12) suffices as an upper bound of 
e~™ ||€4/(t)|], and hence by Theorem 2, ||€4(t)|| < Ke for any t > 0 and ọ 
constrained by ¥. As a consequence, it suffices to show that T* given by Eq. (13) 
is finite, which then by definition implies that system (3) is safe over t > T*. 
Note that the assumption 0 ¢ U implies that there exists a ball (0,8) such that 
B(0, 8) NU = 0. Moreover, Ke is strictly monotonically decreasing w.r.t. t, and thus 
T = max{0,1n(6/K)/a} is an upper bound? of T*, which further implies T* < 00. 


Example 2 (PD-controller [17]). Consider a PD-controller with linear dynamics 
defined, for t > 0, as 


y(t) = v(t); blt) = kp (Yt =r) =y“) — savt = r), (14) 


which controls the position y and velocity v of an autonomous vehicle by adjusting its 
acceleration according to the current distance to a reference position y*. A constant time 


> We rule out the case of a: = 0, which renders the integral in Eq. (12) divergent. 
3 Note that the larger ô is, the tighter bound T will be. 
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delay r is introduced to model the time lag due to sensing, computation, transmission, 
and/or actuation. We instantiate the parameters following [17] as kp = 2, ka = 3, 
y* = 1, and r = 0.35. The system described by Eq. (14) then has one equilibrium 
at (1; 0), which shares equivalent stability with the zero equilibrium of the following 
system, with ĝ = y—landv=v: 


g(t) =A; SE) = —29(t — r) — 30(t — r). (15) 


Suppose we are interested in exploiting the safety property of the system (15) in an 
unbounded time domain, relative to the set of initial states Y = [—0.1,0.1] x [0,0.1] 
and the set of unsafe states U = {(g;0) | |g| > 0.2}. Following our construction 
process, we obtain automatically some key arguments (depicted in Fig. 1) as a = —0.5, 
M = 11.9125, K = 7.59162 and K = 2.21103, which consequently yield T* = 
4.80579 s. By Theorem 3, the unbounded safety verification problem thus is reduced to 
a T-bounded one for any T > T*, inasmuch as oo-safety is equivalent to T-safety for 
the underlying dynamics. 

-K et, K e™]” in Eq. (13) can be viewed as an overapproximation of all trajec- 
tories originating from X. As shown in the right part of Fig. 1, this overapproxima- 
tion, however, is obviously too conservative to be utilized in proving or disproving 
almost any safety specifications of practical interest. The contribution of our approach 
lies in the reduction of unbounded verification problems to their bounded counterparts, 
thereby yielding a quantitative time bound T™* that substantially “trims off” the verifica- 
tion efforts pertaining to t > T*. The derived T-safety verification task can be tackled 
effectively by methods dedicated to bounded verification of DDEs of the form (3), or 
more generally, (1), e.g., approaches in [17] and [4]. 


+ 
lo 7/2?) 


* 
s x z U 


R(A) I(2) t 


Fig. 1. Left: the identified rightmost roots of h(z) in DDE-BIFTOOL and an upper bound 
a = —0.5 such that maxye>o R(A) < a < 0; Center: M = 11.9125 that suffices to split and 
hence upper-bound the improper integral f on |o (1 / 2°) || dv in Eq. (11); Right: the obtained 
time instant T* = 4.80579s guaranteeing the equivalence of oo-safety and T-safety of the 
PD-controller, for any T > T*. 
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4 Nonlinear Dynamics 


In this section, we address a more general form of dynamics featuring substantial non- 

linearity, by resorting to linearization techniques and thereby establishing a quantitative 

stability criterion, analogous to the linear case, for nonlinear delayed dynamics. 
Consider a singly delayed version of Eq. (1): 


x(t) =f (x(t),x(t—1r)), t€ [0,00) 
(eeu t € [-r,0] (16) 


with f being a nonlinear vector field involving possibly non-polynomial functions. Let 
f (x,y) = Ax + By + g(x,y), with A= fx (0,0), B = fy (0,0), 


where fx and fy are the Jacobian matrices of f in terms of x and y, respectively; g is 
a vector-valued, high-order term whose Jacobian matrix at (0,0) is O. 

By dropping the high-order term g in f, we get the linearized counterpart of 
Eq. (16): 
C= g a t € [0, œœ) (17) 
x(t)=@(t), te [-r,0] 


which falls in the scope of linear dynamics specified in Eq. (3), and therefore is asso- 
ciated with a characteristic equation of the same form as that in Eq. (4). Equation (17) 
will be in the sequel referred to as the linearization of Eq. (16) at the steady state 0, 
and ø is used to denote the spectrum of the characteristic equation corresponding to 
Eq. (17). 

In light of the well-known Hartman-Grobman theorem [18,20] in the realm of 
dynamical systems, the local behavior of a nonlinear dynamical system near a (hyper- 
bolic) equilibrium is qualitatively the same as that of its linearization near this equilib- 
rium. The following statement uncovers the connection between the locally asymptotic 
behavior of a nonlinear system and the spectrum of its linearization: 


Theorem 4 (Locally exponential stability [6,36]). Suppose maxyeg R(A) < a < 0. 
Then x = 0 is a locally exponentially stable equilibrium of the nonlinear systems (16). 
In fact, there exists 6 > 0 and K > 0 such that 


lol <5 = GOI < K lele, ve > 0, 


where &q(t) is the solution to Eq. (16). If R(X) > 0 for some A in o, then x = 0 is 
unstable. 


Akin to the linear case, Theorem 4 establishes an existential guarantee that the 
solution to the nonlinear delayed dynamics approaches the zero equilibrium exponen- 
tially for initial conditions within a -neighborhood of this equilibrium. The need of 
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constructing a, K and ô quantitatively in Theorem 4, as essential to our automatic 
verification approach, invokes again the fundamental solution &4/(t) to the linearized 
dynamics in Eq. (17): 


Lemma 5 (Variation-of-constants [19,36]). Consider nonhomogeneous systems of 
the form 
x(t) = Ax (t) + Bx(t—r)+n(t), t€ [0,00) 
x(t)=@(t), t€[-r,0 
Let È¢ (t) be the solution to Eq. (18). Denote by E (t) the solution that satisfies Eq. (17) 
fort > 0 and satisfies a variation of the initial condition as ọ' (0) = I and ¢'(t) = O 
for allt € [—r,0). Then for t > 0, 


(18) 


Eo (t) = €4'(t)b(0) +f Ep (t—7)Bo(t — r) dr +f Eg (t—T)n(7) dr, (19) 


where @ is extended to |[—r, 00) with p(t) = 0 for t > 0. 


In what follows, we give a constructive quantitative estimation of the solutions to 
nonlinear dynamics, which admits a reduction of the problem of constructing an expo- 
nential upper bound of a nonlinear system to that of its linearization, as being immedi- 
ately evident from the constructive proof. 


Theorem 5 (Exponential estimation). Suppose that maxyeg R(A) < a < 0. Then 
there exist K > 0 and 6 > 0 such that \|€4/(t)|| < Ke for any t > 0, and 


| <5 = [€o(8)|| < Ke“? (1+ 12i J erar) løe, vt 20, 
0 


where &4(t) is the solution to nonlinear systems (16) and &¢: (t) is the fundamental 
solution to the linearized counterpart (17). 


Proof. The existence of K follows directly from Eq. (7) in Theorem 2. By the variation- 
of-constants formula (19), we have, for t > 0, 


t t 
Es) = OORS Eor(t—r)Bo(r—r) r+ | Ep (t—7)g(x(T),x(7—1)) dr, 

0 0 (20) 
where œ is extended to [—r,oo) with (t) = 0 for t > 0. Define x?(-) € C, as 
x? (0) = &4(t + 0) for 0 € [—r, 0]. Then g(-,-) being a higher-order term yields that 
for any e€ > 0, there exists ĝe > 0 such that |x? | < ôe implies g (x(t), x(t — r)) < 
e||x? |. Due to the fact that ||€4-(t)|| < Ke® and the monotonicity of ||£¢ (t)|| with 
a < 0, we have IIx? | < Kelt-7), This, together with Eq. (20), leads to 


$ 
\|x? || < Kll et + I ” KIBI lloll ette- ar + f Ken eT elle || ar 
(0) 0 


y t 
=K ( + iif e OT ar) lell eT + Keet | e™®T |x} || dr. 
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Hence, 


is t 
ell x? || < Ke" (1 + isi f eo ar) lloll + Ker | oF lx? || ar. 
0 0 


By the Gronwall-Bellman inequality [1] we obtain 


ray 


eet lxt | < Kee (14 [BI fee ar) løe 
0 
and thus 


lx?|| < Ke"? (: +B fee ar) øllet, 
0 


Set € < —a/(2Ke7"®*) and 6 = min {6e, 8e/ (Ke7" (1+ ||B|| fg e7°7 dr)) }. This 
yields, for any t > 0, 


lol <5 = lEs) < Ke (1+ 121 f erar) llle’, 


completing the proof. 


The above constructive quantitative estimation of the solutions to nonlinear dynam- 
ics gives rise to the reduction, analogous to the linear case, of unbounded verification 
problems to bounded ones, in the presence of a local stability criterion. 


Theorem 6 (Equivalence of safety properties). Given initial state set X C R” and 
bad states U C R” satisfying 0 ¢ U. Let o denote the spectrum of the characteristic 
equation corresponding to Eq. (17). Suppose that maxye>o R(A) < a < 0, and the 
fundamental solution to Eq. (17) satisfies ||€g:(t)|| < Ke®* for any t > 0. Let K = 
Ke™™e (1+ ||Bl| [5 e707 dr) |||]. Then there exists 5 > 0 and T* < œ, defined as 


T* = max{0,inf{T | Vt > T: [—Ke®/?, Ke] nu = 0}, 
such that if || ¥|| < 6, then for any T > T*, the system (16) is oo-safe iff it is T-safe. 


Proof. The proof is analogous to that of Theorem 3, particularly following from the 
local stability property stated in Theorem 5. 


Note that for nonlinear dynamics, the equivalence of safety claimed by Theorem 6 
holds on the condition that ||’|| < ô, due to the locality stemming from linearization. 
In fact, such a set B C R” satisfying ||B|| < ô describes (a subset of) the basin of 
attraction around the local attractor 0, in a sense that any initial condition in 8 will 
lead the trajectory eventually into the attractor. Consequently, for verification problems 
where V > 8, if the reachable set originating from ¥ is guaranteed to be subsumed 
within % in the time interval [T’ — r, T’], then T” + T* suffices as a bound to avoid 
unbounded verification, namely for any T > T’ + T*, the system is oo-safe iff it is 
T-safe. This is furthermore demonstrated by the following example. 
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Example 3 (Population dynamics [4,25]). Consider a slightly modified version of the 
delayed logistic equation introduced by G. Hutchinson in 1948 (cf. [22]) 


N(t)=N(t)[1-—N(t—r)], t>0, (21) 


which is used to model a single population whose percapita rate of growth N(t) /N(t) 
depends on the population size r time units in the past. This would be a reasonable 
model for a population that features a significant minimum reproductive age or depends 
on a resource, like food, needing time to grow and thus to recover its availability. 

If we change variables, putting u = N — 1, then Eq. (21) becomes the famous 
Wright’s equation (see [44]): 


u(t) = —u(t—r)[1+ u(t], t>0. (22) 
The steady state N = 1 is now u = O. We instantiate the verification problem of 
Eq. (22) over [—r, 00) as X = [—-0.2,0.2], U = {u | |u| > 0.6}, under a constant 


delay r = 1. Note that delay-independent Lyapunov techniques, e.g. [32], cannot solve 
this problem, since Wright’s conjecture [44], which has been recently proven in [40], 
together with corollaries thereof implies that there does not exist a Lyapunov functional 
guaranteeing absolute stability of Eq. (22) with arbitrary constant delays. To achieve 
an exponential estimation, we first linearize the dynamics by dropping the nonlinearity 
u(t)u(t — r) thereof: 

v(t) =—v(t-1), t>0. (23) 


Following our constructive approach, we obtain automatically for Eq. (23) a = 
—0.3 (see the left of Fig. 2), M = 2.69972, K = 3.28727, and thereby for Eq. (22) 6 = 
0.00351678, Æ = 0.0338039 and T* = Os. Itis worth highlighting that by the bounded 
verification method in [17], with Taylor models of the order 5, an overapproximation 
Q of the reachable set w.r.t. system (22) over the time interval [14.5, 15.5] was verified 
to be enclosed in the -neighborhood of 0, i.e., || Q|| < 6, yet escaped from this region 
around t = 55.3 s, and tended to diverge soon, as depicted in the right part of Fig. 2, and 
thus cannot prove unbounded safety properties. However, with our result of T* = Os 
and the fact that (2 over [—1, 15.5] is disjoint with M, we are able to claim safety of the 
underlying system over an infinite time domain. 


DDEs with Multiple Different Delays. Delay differential equations with multiple 
fixed discrete delays are extensively used in the literature to model practical systems 
where components coupled with different time lags coexist and interact with each 
other. We remark that previous theorems on exponential estimation and equivalence 
of safety w.r.t. cases of single delay extend immediately to systems of the form (1) with 
almost no change, except for replacing || B|| e77% with Sa || A;|| e77% and || B|| with 
es || A;||, where A; denotes the matrix attached to x(t — r;) in the linearization. For 
a slightly modified form of the variation-of-constants formula under multiple delays, 
we refer the readers to Theorem 1.2 in [19]. 
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Fig. 2. Left: the identified rightmost eigenvalues of h(z) and an upper bound a = —0.5 such 
that maxye>o R(A) < a < 0; Right: overapproximation of the reachable set of the system (22) 
produced by the method in [17] using Taylor models for bounded verification. Together with this 
overapproximation we prove the equivalence of co-safety and T-safety of the system, for any 
T > (T' + T*) =15.5s. 


5 Implementation and Experimental Results 


To further investigate the scalability and efficiency of our constructive approach, we 
have carried out a prototypical implementation in Wolfram MATHEMATICA, which 
was selected due to its built-in primitives for integration and matrix operations. By 
interfacing with DDE-BIFTOOL* (in MATLAB or GNU OCTAVE) for identifying the 
rightmost characteristic roots of linear (or linearized) DDEs, our implementation com- 
putes an appropriate T* that admits a reduction of unbounded verification problems 
to bounded ones. A set of benchmark examples from the literature has been evaluated 
on a 3.6 GHz Intel Core-i7 processor with 8GB RAM running 64-bit Ubuntu 16.04. 
All computations of T* were safely rounded and finished within 6s for any of the 
examples, including Examples 2 and 3. In what follows, we demonstrate in particular 
the applicability of our technique to DDEs featuring non-polynomial dynamics, high 
dimensionality and multiple delays. 


Example 4 (Disease pathology [25,27,32]). Consider the following non-polynomial 


DDE for t > 0: 

BO" p(t — r) 

t) = ————__— — yp(t), 24 

p(t) Ipa- r y(t) (24) 
where p(t) is positive and indicates the number of mature blood cells in circulation, 
while r models the delay between cell production and cell maturation. We consider the 
case 0 = 1 as in [32]. Constants are instantiated as n = 1, 8 = 0.5, y = 0.6 and 
r = 0.5. The unbounded verification problem of Eq. (24) over [—r, 00) is configured as 
X = [0,0.2] and U = {p | |p| > 0.3}. Then the linearization of Eq. (24) reads 


p(t) = —0.6p(t) + 0.5p(t — 0.5). (25) 


* http://Ics.ios.ac.cn/~chenms/tools/UDDER .ar.bz2. 
> http://ddebiftool.sourceforge.net/. 
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With a = —0.07 obtained from DDE-BIFTOOL, our implementation produces 
for Eq. (25) the values M = 2.23562, K = 1.75081, and thereby for Eq. (24) 
6 = 0.0163426, K = 0.0371712 and T* = Os. Thereafter by the bounded verifi- 
cation method in [17], with Taylor models of the order 5, an overapproximation of the 
reachable set w.r.t. system (24) over the time interval [25.45, 25.95] was verified to be 
enclosed in the d-neighborhood of 0. This fact, together with T* = Os and the over- 
approximation on [—0.5, 25.95] being disjoint with U, yields safety of the system (24) 
over [—0.5, 00). 


Example 5 (Gene regulation [12,36]). To examine the scalability of our technique to 
higher dimensions, we recall an instantiation of Eq. (2) by setting n = 5, namely with 
5 state components x = (21;...;25) and 5 delay terms r = (0.1; 0.2; 0.4; 0.8; 1.6) 
involved, g(x) = —a, 6; = 1 for j = 1,...,5, X = B((1;1;1;1;1),0.2) and U = 
{x | |vi] > 1.5}. With a = —0.04 derived from DDE-BIFTOOL, our implementation 
returns M = 64.264, K = 4.42207, K = 49.1463 and T* = 87.2334 s, thereby 
yielding the equivalence of oo-safety to T-safety for any T > T*. Furthermore, the 
safety guarantee issued by the bounded verification method in [4] based on rigorous 
simulations under T' = 88s suffices to prove safety of the system over an infinite time 
horizon. 


6 Conclusion 


We have presented a constructive method, based on linearization and spectral analysis, 
for computing a delay-dependent, exponentially decreasing upper bound, if existent, 
that encloses trajectories of a DDE originating from a certain set of initial functions. We 
showed that such an enclosure facilitates a reduction of the verification problem over 
an unbounded temporal horizon to a bounded one. Preliminary experimental results on 
a set of representative benchmarks from the literature demonstrate that our technique 
effectively extends the scope of existing bounded verification techniques to unbounded 
verification tasks. 

Peeking into future directions, we plan to exploit a tight integration of our tech- 
nique into several automatic tools dedicated to bounded verification of DDEs, as well 
as more permissive forms of stabilities, e.g. asymptotical stability, that may admit a sim- 
ilar reduction-based idea. An extension of our method to deal with more general forms 
of DDEs, e.g., with time-varying, or distributed (i.e., a weighted average of) delays, will 
also be of interest. Additionally, we expect to refine our enclosure of system trajectories 
by resorting to a topologically finite partition of the initial set of functions. 
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